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METHODS  FOR  REDUCING  COMPUTATIONAL  COSTS  OF  TYPICAL 
FINITE  ELEMENT  UNSTEADY  HYDRODYNAMIC  MODELS 


PART  I :  INTRODUCTION 

1.  In  general,  either  the  finite  difference  method  or  the  finite  ele¬ 
ment  method  are  employed  to  obtain  numerical  solutions  of  the  partial  differ¬ 
ential  equations  governing  the  motion  of  bodies  of  water.  Historically,  the 
majority  of  numerical  hydrodynamic  models  have  employed  the  finite  difference 
method.  The  major  reason  for  this  is  probably  because  the  replacement  of  de¬ 
rivatives  by  divided  differences  is  much  easier  for  the  typical  engineer  to 
understand  than  basic  ideas  in  variational  calculus,  upon  which  the  finite 
element  method  is  based.  However,  as  will  be  demonstrated,  the  two  methods 
share  much  common  ground.  In  fact,  most  computational  methods  can  be  derived 
from  the  method  of  weighted  residuals,  with  the  major  difference  in  the 
methods  determined  by  the  selection  of  the  weighting  function  to  which  the 
solution  error  is  made  orthogonal.  Undoubtedly,  the  major  reason  the  finite 
element  method  came  to  be  applied  in  numerical  hydrodynamic  modeling  is  be¬ 
cause  of  its  ability  to  resolve  complex  geometry  in  the  computational  domain. 
In  particular,  the  ability  to  resolve  winding  navigation  channels  in  estuarine 
sedimentation  studies  is  virtually  a  necessity.  However,  despite  its  ability 
to  resolve  complex  geometry  in  the  physical  domain,  the  finite  element  method 
suffers  from  the  fact  that  it  is  relatively  more  complicated  and  expensive  to 
program.  In  addition,  most  existing  hydrodynamic  finite  element  models  appear 
to  require  significantly  more  computational  effort  per  time-step.  However, 
this  may  be  attributed  to  the  techniques  commonly  employed  by  finite  element 
modelers  in  solving  the  resulting  matrix  equation  system  rather  than  the 
method  itself. 

2.  A  good  example  of  typical  finite  element  hydrodynamic  models  is  the 
set  of  models  developed  by  Resource  Management  Associates  (RMA)  for  the  U.S. 
Army  Corps  of  Engineers  Waterways  Experiment  Station's  Hydraulics  Laboratory. 
Codes  for  laterally  averaged  water  bodies  (RMA-7) ,  vertically  averaged  estu¬ 
aries  and  bays  (RMA-2)  and  completely  three-dimensional  (RMA-10)  have  been 
developed.  These  codes  are  rather  general  in  that  non-structured  grids  that 
can  be  composed  of  combinations  of  element  types,  e.g.  quadrilaterals, 


triangles,  etc.  are  employed.  Since  the  governing  equations  are  not  linear¬ 
ized,  Newton  iteration  is  required  to  solve  the  resulting  set  of  nonlinear 
algebraic  equations.  At  each  iteration  of  the  Newton  scheme  a  set  of  linear 
equations  is  solved  using  a  form  of  Gaussian  elimination  called  a  frontal 
solver . 

3.  Such  models  can  be  efficiently  applied  in  studies  where  steady  state 
solutions  are  desired.  However,  even  two-dimensional  computations  can  become 
costly  for  relatively  long  term  (many  tidal  cycles)  simulations  on  relatively 
modest  grids,  bong  term  three  dimensional  simulations  are  virtually  impos¬ 
sible  from  an  economic  standpoint  with  such  solution  schemes. 

4.  The  major  purpose  of  this  study  was  to  suggest  ideas  for  reducing 
the  computational  time  of  typical  finite  element  hydrodynamic  models.  The 
approach  taken  has  been  to  consider  "quick  fix"  ideas,  e.g.  implementation  of 
explicit  time  integration  with  a  diagonalized  mass  matrix,  as  well  as,  a  more 
"permanent  fix"  involving  a  tensor  product  implicit  solution  programmed  for 
parallel  processing.  The  comprehensive  permanent  solution  suggested  has  bene- 
fitted  from  the  substantial  progress  made  in  aerodynamics  computat ional  fluid 
dynamics  research  over  the  past  decade.  The  algorithm  construction  presented 
represents  a  step  forward  in  top-down  CFD  algorithm/code  design  from  theory 
through  to  parallel  processing  organization. 

5.  Before  presenting  ideas  for  increasing  the  computational  efficiency 
of  typical  finite  element  hydrodynamic  models,  features  of  both  the  finite 
difference  and  finite  element  methods  are  discussed.  This  discussion  focuses 
on  the  common  basis  of  the  two  computational  methods  as  well  as  reasons  why 
the  finite  element  method  can  be  so  costly. 


PART  11:  SOLUTION  METHODS 


Types  of  Numerical/Free-Surf ace  Hydrodynamic  Models 

6.  Numerical  hydrodynamic  models  can  differ  widely,  depending  upon  such 
things  as  the  solution  technique  applied  to  the  governing  differential  equa¬ 
tions  representing  the  physical  processes,  the  assumptions  made  in  the  deriva¬ 
tion  of  the  governing  equations,  whether  the  phenomena  are  steady  or  time- 
varying,  and  the  spatial  dimensionality  considered. 

7.  If  the  complete  three-dimensional  (3-D)  equations  of  motion  are 
integrated  over  a  cross  section,  one-dimensional.  (1-D)  models  result.  Such 
models  are  commonly  applied  in  computing  river  hydrodynamics,  e.g.,  the  com¬ 
putation  of  floods.  Averaging  over  either  the  depth  or  the  width  results  in 
two-dimensional  (2-U)  models.  Vertically  averaged  models  are  applicable  for 
the  computation  of  nearly  horizontal  flow  in  relatively  shallow  and  well-mixed 
bodies  of  water,  whereas  laterally  averaged  models  are  appropriate  when  deal¬ 
ing  with  relatively  narrow  and  deep  bodies  of  water  experiencing  vertical 
stratification  of  the  water  density. 

8.  Even  though  a  free  surface  exists  on  open  bodies  of  water,  some 
modelers  have  treated  the  surface  as  a  rigid  lid  when  very  little  motion  of 
the  free  surface  occurs.  The  surface  then  becomes  in  essence  a  solid  boundary 
and  the  normal  component  of  the  velocity  must  be  zero.  In  addition,  the  pres¬ 
sure  can  no  longer  be  prescribed  at  the  surface  but  rather  must  be  computed. 
The  pressure  boundary  condition  then  takes  the  form  of  a  derivative  boundary 
condition,  i.e.,  a  Neumman  condition  as  opposed  to  a  Dirichlet  condition  in 
the  true  free-surface  case.  All  hydrodynamic  modeling  discussed  in  this 
report  treats  the  surface  as  being  free  to  move  so  that  free-surface  waves, 
e.g.,  tidal  waves  in  estuaries,  are  free  to  be  computed. 

9.  The  governing  hydrodynamic  equations  are  nonlinear  partial  differ¬ 
ential  equations,  which  in  a  strict  mathematical  sense  are  classified  as  being 
of  the  parabolic  type.  However,  outside  the  boundary  layer  the  equations  ex¬ 
hibit  a  strong  hyperbolic  or  wave  character  due  to  the  dominance  of  the  con¬ 
vective  terms  and  thus  are  often  considered  as  being  of  the  hyperbolic  type. 

In  any  case,  because  of  the  nonlinearity,  analytical  solutions  do  not  gener¬ 
ally  exist  and  one  must  resort  to  numerical  methods  to  obtain  an  approximation 
of  the  continuous  solution  of  the  differential  equations. 


Such  methods 


consist  primarily  of  the  use  of  either  finite  differences  or  finite  elements. 


Finite  Difference  Method 


10.  In  the  finite  difference  method,  the  domain  of  the  independent 
variables  is  replaced  by  a  finite  set  of  points,  referred  to  as  net  or  mesh 
points,  which  are  structured,  i.e.,  there  are  ordered  directions  in  the  mesh. 
One  then  seeks  to  determine  approximate  values  for  the  desired  solutions  at 
these  points.  The  values  at  the  mesh,  points  are  required  to  satisfy  differ¬ 
ence  equations  that  can  be  derived  in  several  ways,  although  they  are  usuallv 
obtained  by  replacing  partial  derivatives  by  difference  quotients.  Three 
approaches  are  presented  below. 

Polynomial  fitting 

11.  With  the  polynomial  approach  the  dependent  variables  are  repre¬ 
sented  by  a  polynomial  function  with  the  coefficients  determined  by  the  method 
of  collocation,  i.e.,  evaluation  of  the  polynomial  at  the  net  points.  The 
polynomial  is  then  analytically  differentiated  to  yield  difference  expressions 
for  various  order  derivatives. 

12.  Consider  the  representation  in  one  dimension  of  a  dependent  vari¬ 
able  as  a  quadratic,  i.e. 


f(x)  =  a  +  bx  +  cx‘ 


(1) 


Evaluating  f (x)  at  the  mesh  points  shown  below 
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from  which 


a  =  f . 
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fi+l  +  f i-1  "  2fi 


Now,  differentiating  Equation  (1)  and  substituting  the  expressions  from  Equa¬ 
tions  (5)  -  (7)  yields  the  centered  difference  expressions  below  for  first  and 
second  order  derivatives. 
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13.  As  will  be  seen  later,  the  idea  of  representing  the  dependent  vari¬ 
ables  by  piecewise  continuous  polynomials  is  basic  to  the  finite  element 
method.  However,  unlike  the  approach  taken  above,  the  method  of  collocation 
is  not  employed. 

Integral  method 

14.  In  the  integral  method,  the  governing  equation  is  approximately 
satisfied  in  an  integral  rather  than  a  differential  sense.  Consider  the 
ID  equation 


30  _  _  3 (0u) 

3t  “  3x 


with  an  integration  over  the  spatial  domain  from  (x  -  Ax/2)  to  (x  +  Ax/2) 
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Hy  the  mean-value  theorem  of  calculus 
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where  x  lies  between  (x-Ax/2)  and  (x+Ax/2).  Using  a  mid  point  evaluation 
yields 


0.  =  - 
l 


(0u)1+i/2  -  (0u)._1/2 


which  can  be  solved  for  0  using  some  time  integration  scheme.  Note  that 
once  again  the  replacement  of  the  spatial  derivative  is  the  equivalent  of  a 
centered  difference  expression. 

15.  The  idea  of  satisfying  the  differential  equation  in  an  integral 
sense  rather  than  a  differential  sense  is  also  basic  to  the  finite  element 
method.  However,  in  the  finite  element  method,  rather  than  setting  the  inte¬ 
gral  of  the  equation  over  the  spatial  domain  equal  to  zero,  the  integral  of 
the  solution  error  multiplied  by  a  non  constant  weighting  function  is  set  to 


Taylor  series  expansion 

16.  The  most  common  approach  taken  in  the  finite  difference  method  is 
to  use  Taylor  series  expansions  to  derive  difference  quotients  that  replace 
the  partial  derivatives  in  the  governing  equations.  For  example,  f(x) 
evaluated  at  x  =  i+ 1  would  be  written 


and  thus 
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where  O(Ax)  refers  to  additional  terms  with  factors  of  size  Ax  .  Kquation  15 
is  referred  to  as  a  forward  difference  approximation  to  the  first  derivative 
of  i(x).  Expanding  a  Taylor  series  backwards  to  x=i-l  yields  a  backwards 
difference  expression.  A  subtraction  of  the  forward  and  backward  expression 
yields  the  centered  difference  expression  presented  in  Equation  8.  Expres¬ 
sions  for  higher  order  derivatives  of  various  degrees  of  accuracy  can  be 
derived  in  a  similar  fashion.  The  finite  difference  approximation  of  the  dif¬ 
ferential  equation  is  then  obtained  by  replacing  the  partial  derivatives  by 
the  difference  quotients.  Roache  (1972)  presents  an  excellent  discussion  of 
the  use  of  the  finite  difference  method  in  solving  partial  differential 
equations. 

17.  All  of  these  approaches  to  deriving  a  finite  difference  recursive 
algorithm  can  lead  to  the  same  expressions  under  certain  conditions.  However, 
this  is  not  true  in  general.  For  example,  beyond  the  second  order  polynomial 
the  expressions  obtained  are  not  identical  to  those  from  higher  order  Taylor 
series  expansions.  Similar  results  will  be  demonstrated  for  the  finite  ele¬ 
ment  method.  For  example,  the  finite  element  method  always  yields  the  appro¬ 
priate  order-accurate  finite  difference  recursion  relation  on  a  uniform 
discretization  in  one  dimension  if  the  governing  differential  equation  is 
linear  and  linear  polynomials  are  employed.  This  is  not  normally  true  for 
nonlinear  equations  and/or  nonuniform  discretizations  and/or  multidimensional 
problems. 


Finite  Element  Method 


18.  In  the  finite  element  approach,  the  field  is  divided  into  smaller 
regions  of  convenient  shapes,  such  as  triangles  or  quadrilaterals,  and  the 
solution  is  approximated  on  each  element  by  interpolation  from  nodal  values  on 
the  element.  Using  a  variational  principle  for  Sturm-Liouvi lie  type  equa¬ 
tions,  or  a  weighted-residual  method  for  general  equations,  the  partial  dif¬ 
ferential  equations  are  then  transformed  into  finite  element  equations 
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governing  each  isolated  element.  These  local  equations  are  then  collected  to 
form  a  global  system  of  ordinary  differential  (in  time),  or  algebraic,  equa¬ 
tions  including  a  proper  accounting  of  boundary  conditions.  The  nodal  values 
of  the  dependent  variables  are  then  determined  from  solution  oi  this  mat r in 
equat Lon  system.  Baker  (1983)  introduces  the  method  in  the  following  manner 
"the  finite-element  algorithm  is  perhaps  most  easily  interdicted 
as  an  approximate  transformation  of  a  partial  differential  equa¬ 
tion  (system)  into  a  larger  system  oi  lower  order  differential 
equations.  The  computational  mesh  is  formed  by  the  union  (summa¬ 
tion)  of  nonoverlapping  subdomains,  called  finite  elements,  and 
the  mesh  can  be  "arbitrarily"  nonregular.  Approximation  polyno¬ 
mials  (of  degree  k.)  are  prescribed  on  a  local  basis  within  each 
finite  element,  to  represent  all  dependent  variables  and  param¬ 
eters,  such  as  viscosity,  Reynolds  stress,  and  thermal  conductiv¬ 
ity.  These  approximation  basis  function  sets  are  "cardinal," 
i.e.,  they  reduce  to  zeroes  and  ones  at  predetermined  locations, 
within  or  on  the  boundary  of  each  computational  subdomain  called 
nodes.  The  boundary  of  the  solution  domain  need  not  coinc ide  with 
surfaces  of  a  global  coordinate  system.  Nevertheless,  the 
"natural"  elliptic  boundary  condition  specification,  relating  the 
dependent  variable  and  its  normal  derivative,  is  routinely  imple¬ 
mented  along  any  (all)  disconnected  portions  of  the  solution  domain 
boundary.  The  ultimate  algebraic  equation  solution  matrix  struc¬ 
ture  produced  by  the  finite  element  algorithm  is  sparse  and  handed, 
with  band  width  being  a  function  of  problem  dimension,  the  degree 
of  interpolation,  and  the  discret ization  node  numbering.  The 
resultant  system  is  directly  solvable,  or  an  iterative  procedure 
may  be  developed  using  tensor  matrix  products  or  other  formulations. 
For  parent  equation  systems  exhibiting  initial  value  character,  the 
algorithm  yields  a  system  of  ordinary  differential  equations  that  can 
be  integrated  using  any  explicit  or  implicit  procedure." 


Variational  approach 


19.  In  contrast  to  the  calculus  of  a  function,  the  finite  element 


method  is  based  upon  the  calculus  of  variations.  In  general,  if  vie  wish  to 
find  functions  <f^(x,y)  that  are  differentiable  on  (x,y)  and  satisfy  fixed 
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constraints  on  the  boundary  of  the  domain  that  minimize  some  integral 
funct ional 


(16) 


the  calculus  of  variations  determines  that  <t^(x,y)  are  the  solutions  of  the 
Euler-Lagrange  equations 
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Thus,  if  the  problem  is  to  determine  solutions  of  linear  partial  differential 
equations  similar  to  equation  17  one  can  instead  extremize  the  integral  state¬ 
ment  given  by  equation  16  once  the  functional  has  been  determined  by  inspec¬ 
tion.  Identification  of  the  functional 
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is,  of  course,  a  crucial  step.  Construction  of  the  finite  element  solution  is 
based  upon  an  approximate  evaluation  of  the  integral,  followed  by  construction 
of  the  extremum. 

20.  In  the  evaluation  of  the  integral,  a  functional  form  must  be 
assumed  for  4  which  contains  expansion  coefficients.  From  a  review  of 
properties  of  fundamental  solutions  of  Sturm-Liouville  type  equations  such  as 
Equation  17,  Baker  (1983)  discusses  the  desirable  properties  of  orthogonality 
and  completeness  and  their  relationship  to  the  selection  of  the  functional 
form  for  <f  .  The  basic  definition  of  these  two  properties  are  given  below. 
Two  functions  are  orthogonal  on  the  interval  x,  ^  x  ^  x9  if 
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where  5  is  the  Kronecker  delta.  The  finite*  element  procedure  leads  to  the 
nm 

establishment  of  locally  orthogonal  approximation  functions,  i.e.  Kquation  1 '< 
is  satisfied  with  the  exception  that  the  integral  fails  to  vanish  on  an  indi¬ 
vidual  finite  element.  The  orthogonality  property  is  important  since  it  re¬ 
sults  in  sparse  coefficient  matrices  in  the  resulting  linear  algebra  problem. 

21.  A  function  set  L'^(x)  is  del  ined  as  complete  on  Xj  \  •  x  if 

foi  every  6  >  0  there  exists  a  number  N  >  0  such  that  the  dist  nice  between 

an  arbitrary  square  integrable  function  f(x)  and  a  series  expansion  in 

1  (x)  can  be  made  arbitrarily  small,  i.e. 

n 
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Wit i i  tliis  property  an  increase  in  accuracy  for  the  solution  f(x)  as  N 
increases  is  guaranteed. 

22.  The  use  of  local  polynomials  as  the  approximating  functions  pro¬ 
vides  some  degree  of  orthogonality  coupled  with  the  assurance  of  reducible 
error  as  either  the  grid  is  refined  or  as  the  degree  of  the  polynomial  is 
increased,  i.e.,  as  a  more  complete  basis  is  utilized.  Other  functions  could 
be  used,  e.g.,  completely  orthogonal  functions  such  as  Bessel  functions  or 
perhaps  a  polynomial  spanning  the  entire  spatial  domain  rather  than  an  indi¬ 
vidual  element.  However,  as  Baker  (1983)  notes,  the  use  of  completely  ortho¬ 
gonal  functions  severely  constrains  boundary  condition  flexibility,  although 
it  does  admit  improved  solution  accuracy  with  the  use  of  a  more  complete 
basis.  The  use  of  polynomials  spanning  the  complete  domain  results  in  no 
degree  of  orthogonality  and  thus  the  coefficient  matrix  that  is  formed  in  the 
linear  algebra  problem  becomes  extremely  dense  and  numerical  error  eventually 
swamps  the  solution  as  higher  degree  polynomials  are  employed.  The  finite 
element  method  is  synonymous  with  the  use  ol  piecewise  local  interpolation 
polynomials  for  approximating  functions. 

Method  of  weighted  residuals 

23.  Since  the  equations  of  fluid  mechanics  are  written  in  an  Kulerian 
frame  of  reference  they  are  nonlinear.  Thus  the  variational  approach  to  the 
finite  element  method  cannot  be  used  since  the  functional  given  hv  Kquation  IK 
cannot  be  found.  Therefore,  for  nonlinear  equations  the  finite  element  method 


is  based  upon  the  method  of  weighted  residuals.  As  a  matter  of  fact,  virtu¬ 
ally  all  computational  methods  can  be  derived  in  this  manner. 

24.  As  discussed  by  Baker  (1983),  the  fundamental  concept  is  the  con¬ 
straint  of  the  solution  error.  One  deals  directly  with  the  differential  equa¬ 
tion  L(q)  and  the  boundary  condition  £(q)  ,  where  L(‘)  and  )6  ( * )  are 
differential  operators.  Assuming  that  qh  is  the  approximate  solution,  then 
L(q  )  and  £(q  )  are  statements  of  the  error  in  the  solution  approximation. 
This  error  is  then  required  to  be  orthogonal  to  some  set  of  weighting  func¬ 
tions  W(x)  over  the  domain  and  its  boundary  i.e.. 


jf  |w(x)|h(qh)dx  ~  X  J  W(x)!(qh)do  =  lO) 


where 

X  =  Arbitrary  multiplier 
x  =  x,  y,  z 

SI  =  Computational  domain 
=  Boundary  of  domain 
do  =  Boundary  area  differential 
dx  =  Domain  volume  differential 

There  are  as  many  scalar  equations  as  members  of  the  weight  set  (w(x))  .  The 
next  step  is  to  write  Equation  21  as  the  global  assembly  of  the  equation 
applied  over  each  element,  i.e. 


where  ,  M 

q  (x)  -  V  q  (x)  (23) 

e  =  l 

and  M  is  the  total  number  of  elements.  The  basic  distinction  between  the 
finite  element  method  and  other  computational  methods,  e.g.  finite  volumes,  is 
the  selection  of  the  weight  function  basis  (w(x) 

25.  Baker  (1983)  demonstrates  that  in  order  to  reproduce  the  energy 
functional  extremizat ion  tor  the  linear,  steady  state  heat  conduction  problem 
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the  weight  function  basis  must  employ  the  same  basis  as  used  in  the  appro:  i 
t  Lon  of  the  so  lution,  i.e. 


q  (x)  -  IN  (x)  }  {({} 

^e  k  i 


and  thus 


{W(x)}  =  {N^(x)} 


The  comparison  is  exact  except  for  a  closed  surface  integral  over  the  boundary 
of  each  element.  For  the  analogy  to  be  exact  this  integral  must  be  zero. 

This  is  not  the  case  for  an  individual  element  but  the  assembly  of  the  surface 
integral  over  the  complete  domain  does  yield  a  zero  value.  However,  this 
imp  Lies  that  the  derivative  of  a  dependent  variable  may  be  discontinuous  at 
element  intersections. 

26.  A  finite  difference  algorithm  can  be  derived  by  setting  the  weight 
function  basis  {W(x)l  to  be  a  set  of  constants.  The  surface  integral  noted 
above  then  becomes  zero  over  each  element  since  a  derivative  of  fW(x)} 
appears  in  the  integral  and  the  derivative  of  a  constant  is  zero.  Therefore, 
derivatives  of  the  dependent  variables  are  continuous  at  finite  difference 
cell  interfaces.  This  is  a  fundamental  difference  between  the  two  methods  as 
derived  by  the  method  of  weighted  residuals.  The  reason  for  defining  the 
finite  element  method  by  forcing  the  method  of  weighted  residuals  to  reproduce 
the  variational  problem  for  a  linear  problem  is  because  the  theoret ical  struc¬ 
ture  tor  such  linear  problems  guarantees  an  optimally  accurate  solution. 

The  linear  Algebra  Problem 

27.  Consider,  for  example,  the  depth-nveraged  2-H  equations  for  f roe- 
surf. ice  hydrodynamic  flows.  A  finite  element  (FF.)  algorithm  for  these  equa¬ 


tions  formally  states  the  requirement  to  establish  an  approximation  q  (•!  to 
the  solution  set  q(x,t)  -  H('),  u^C*))  where  is  the  water  surface  and 


u.  is  the  velocity,  to  be  constructed  on  a  discretization  52  of  the  solu- 

tier,  domain  52  c  >'  t  o{x,:  1  '  i  <  2 ,  t  >  t  }  .  This  nnnroxim.it  ion  is 
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constructed  as  the  union  of  elemental  contributions  q  (x.,t)  ,  which  in  turn 
are  defined  as  expansions  in  the  k  degree  polynomial  cardinal  basis  set 
IN ,(•)}  and  time-dependent  nodal  expansion  coefficients  { Q ( t ) }  .  Mathemat- 

ically,  this  statement  of  approximation  is 


U  qe(xi,t) 


where 


qe  ’=  {Nk(xi)}  {C^(t^e 


28.  Since  Equations  26-27  define  the  approximation,  substitution  into 
the  flow  equations  defines  the  associated  error.  The  second  step  of  a  FE 
algorithm  is  therefore  a  formal  statement  of  constraint  on  this  error.  The 
form  of  this  constraint  which,  as  previously  noted,  enjoys  an  optimal  error 
estimate  for  a  linear  elliptic  problem  statement,  is  the  so-called  Calerkin 
statement,  which  requires  the  distribution  of  the  error  to  be  orthogonal  to 
the  function  space  used  to  construct  q*1  .  For  the  more  general 

problem  statement  wherein  nonsmooth  solutions  to  hyperbolic  equations  are 
sought,  the  Galerkin  statement  is  typically  augmented  with  an  additional  dis¬ 
sipative  constraint,  Baker  (1983).  The  mathematical  statement  of  error  con¬ 
straint  is  thus  of  the  form 


2(Nk(*)}  L(qh)  +~B* 


2V{Nk(-) )LC  (qh)  2  (0} 


A  Q 

where  6  is  a  parameter  set  that  can  be  optimized  and  1.  (•)  is  the  sub¬ 
stantial  derivative  operator. 

29.  Upon  evaluation  of  the  integrals  in  Equation  28,  there  results  an 
ordinary  differential  equation  system  written  on  the  time  evolution  of  the 
expansion  coefficient  set  {Q(t)}  in  the  form 


where 


$ 


*.V 


*8 


[M]  =  mass  matrix 

{U}  =  convection  velocity  field 


[C]  =  associated  (convective  derivative)  influence  on  the  dependent 
variable  set  { Q } 


{G}  =  remaining  source  terms 


For  the  time-accurate  solution.  Equation  29  is  employed  to  evaluate  the  Taylor 
series 


lF}  £  (Q}.+i  _  (Q}.  -  At  2-  (Q).+e 
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Equation  30  is  a  nonlinear  algebraic  equation  system  for  P  >  0  ,  the  solution 
statement  for  which  is  cast  using  a  Newton  iteration  algorithm  in  the  form 


m  («q>£;  =  - 


(31) 


where  p  is  the  iteration  index.  The  solution  field  is  defined  as 


'«'£!  h  +  «■<! 


(32) 


and  the  Newton  Jacohian  is  constructed  as 
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All  computational  models  of  unsteady  physical  conservation  laws  eventually 
produce  the  algebraic  equation  system  (Equation  30)  the  numerical  solution  ut 

V 

which  (Equations  31-33)  constitutes  the  heart  of  a  code. 

30.  in  the  finite  element  method  the  Jacobian  or  coefficient  matrix  is 
a  relatively  sparse  matrix  whose  bandwidth  crucially  impacts  upon  the  computa¬ 
tional  costs  of  solving  equation  31.  The  bandwidth  is  dependent  upon  the 
dimensionality  of  the  problem,  cite  number  of  coupled  variables  being  computed, 
the  number  of  elements  and  the  degree  of  the  approximating  polynomials.  Since 
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the  Jacobian  is  independent  of  x  ,  any  direct  solution  procedure  is  usable. 
Most  existing  finite  element  hydrodynamic  models,  e.g.  RMA  models,  employ  a 
solution  scheme  which  reduces  the  bandwidth  before  employing  Gaussian  elimina¬ 
tion.  However,  the  computational  costs  for  multidimensional  (especially  3D) 
models  can  still  become  astronomical  for  time  dependent  problems. 


Suntmarv 


31.  The  two  major  computational  methods,  i.e.  finite  differ* nee  and 
finite  element,  have  been  discussed.  Common  features  as  well  as  major  differ¬ 
ences  have  been  noted.  The  finite  element  method  for  general  equations  must 
be  derived  using  the  method  of  weighted  residuals,  with  the  weight ing  function 
being  the  same  as  that  used  to  approximate  the  dependent  variable.  It  can  be 
shown  for  a  linear  equation  that  this  results  in  the  lowest  possible  solution 
error.  However,  this  can  not  be  shown  for  nonlinear  equations  and,  in  addi¬ 
tion,  as  a  result  of  a  nonzero  surface  integral  on  individual  elements,  deri¬ 
vatives  of  the  dependent  variables  are  not  generally  continuous,  'therefore, 
mass,  continuity  may  not  be  satisfied  locally  in  hydrodynamic,  models,  although 
it  will  be  satisfied  globally,  i.e.  over  the  complete  domain. 

32.  Finite  difference  algorithms  are  normally  derived  by  replacing  the 
derivatives  l.v  difference  quotients  derived  from  Taylor  series  expansions. 

Tiie  lesulting  difference  equations  are  then  evaluated  at  each  net  point.  As 
demonstrated,  however,  there  are  other  approaches,  e.g.  the  use  of  polynomials 
and  the  integral  method.  Each  possesses  features  of  the  finite  element 
method.  In  fact,  if  the  weighting  functions  are  taken  as  constants,  finite 
difterence  type  algorithms  can  be  developed  from  the  method  of  weighted  resid¬ 
uals.  In  particular,  setting  the  constants  equal  to  one  results  in  what  is 
referred  to  ns  a  finite  volume  algorithm.  On  a  structured  grid  with  uutiorm 
grid  spacing,  a  finite  difference  algorithm  that  would  result  from  application 
of  the  integral  method  results.  With  constants  used  for  tile  weighting  func¬ 
tions,  the  continuity  of  derivatives  Is  assured  and  thus  mass  conservation  is 
satisfied  even  on  the  element  level.  However,  the  global  solution  error  may 
be  larger  than  when  the  weighting  function  Is  taken  as  the  basis  function. 

33.  Finally,  it  has  been  demonstrated  that  regardless  of  the  computa¬ 
tional  theory,  ultimately  a  linear  algebra  problem  must  be  solved.  The  manner 


in  which  this  problem  is  handled  is  the  major  factor  that  determines  the  com 
putational  cost  in  three  dimensional  time  varying  hydrodynamic  models. 
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PART  III:  SUGGESTION  FOR  IMPLEMENTATION  IN  THE  SHORT-TERM 


34.  The  major  computational  costs  involved  in  multidimensional  numeri¬ 
cal  hydrodynamic  models  occur  in  solving  the  linear  algebra  problem  given  by 
Equation  31.  Note  that  even  if  the  original  system  of  equations  are  non¬ 
linear,  a  linear  system  of  the  form 


lAHx]  =  {b} 


(34) 


must  be  solved  for  each  Newton  iteration.  Therefore,  the  major  considera¬ 
tion  in  attempting  to  reduce  the  computational  costs  involves  the  manner  in 
which  the  linear  algebra  problem  is  solved. 

35.  As  previously  noted,  the  size  or  bandwidth  of  the  coefficient 
matrix  A  is  dependent  upon  the  problem  dimensionality,  the  number  of  coupled 
dependent  variables,  the  number  of  finite  elements,  and  the  degree  of  the 
interpolating  polynomials.  Anything  that  can  be  done  to  reduce  the  size  of 
A  should  reduce  computational  costs,  regardless  of  the  solution  procedure 
employed  for  the  linear  algebra  problem. 

36.  Depending  upon  the  type  of  element  shapes  employed,  the  complexity 
of  various  integrals  formed  in  the  finite  element  method  varies.  If  numeri¬ 
cal  quadrature  is  required  for  the  evaluation  of  these  integrals  over  each 
element,  a  substantial  portion  of  the  computational  costs  for  one  and  two 
dimensional  hydrodynamic  models  can  be  related  to  construction  of  the  coeffi¬ 
cient  matrix.  Therefore,  one  area  to  consider  for  possible  reduction  of 
computational  costs  is  in  the  construction  of  the  coefficient  matrix. 

37.  Numerical  hydrodynamic  models  are  typically  time  dependent  models 
for  computing  flows  over  time  periods  ranging  from  a  few  tidal  cycles  to 
perhaps  several  months.  In  recent  years  most  models,  both  finite  difference 
and  finite  element,  have  employed  implicit  time  integration  schemes.  How¬ 
ever,  depending  upon  the  application,  explicit  schemes  may  be  more  economi¬ 
cal  and  should  be  considered  as  a  solution  that  could  be  implemented  in  a 
short-term  time  frame. 

Linear  Algebra  Solvers 

38.  Any  numerical  linear  algebra  procedure  is  a  candidate  for  solving 
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Equation  34.  A  key  implementation  aspect  relates  to  code  efficiency.  Finite- 
element  fluid  mechanics  codes  have  gained  a  reputation  for  being  less  effi¬ 
cient  than  their  finite  difference  counterparts.  As  noted  earlier,  this  is 
not  a  consequence  of  the  finite  element  theory  itself,  but  instead  reflects 
directly  on  implementation  choices  for  the  linear  algebra  statement.  Based  on 
the  finite  element  structural  mechanics  historical  development  of  direct, 
out-of-core  solvers,  many  (most)  finite  element  code  implementations  have 
relied  on  direct  extension  of  these  solver  software  packages  to  the  fluid 
mechanics  problem  classes.  In  distinction,  a  direct  solution  (of  even  a 
linear  Poisson  equation)  is  never  attempted  in  a  FD  code,  but  instead  a  matrix 
iteration  procedure  is  defined  that  uses  easily  formed  (usually  block- 
tridiagonal)  approximations  to  the  Newton  algorithm  Jacobian  (Equation  33). 

The  computer  storage  requirement  for  the  approximate  Jacobian  is  negligible  in 
comparison,  as  is  the  CPU  needed  to  execute  an  LU  decomposition  and  back  sub¬ 
stitution.  The  computational  penalty  is  a  much  reduced  convergence  rate  in 
comparison  to  a  Newton  iteration,  but  each  iteration  proceeds  so  rapidly  that 
Che  many  grid  sweeps  required  for  the  FD  code  are  typically  completed  with 
minimal  expenditure  of  computer  resources. 

39.  Candidate  methods  employed  in  finite  difference  models  include 
point  iterative  methods  (Jacobi,  Gauss-Seidel ,  successive  over-relaxation), 
line  iterative  methods  (vertical/horizontal  over-relaxation,  alternating 
direction  implicit,  approximate  factorization),  conjugate  gradient  methods  and 
multi-grid  techniques. 

40.  The  RMA  models  employ  a  direct  solver  called  a  frontal  solution 
procedure.  In  the  frontal  method  the  assembly  and  elimination  of  equations  is 
interweaved,  based  on  the  fact  that  an  equation  can  be  eliminated  if  all  the 
elements  contributing  to  the  equations  are  assembled.  The  front  is  defined  by 
all  the  active  coefficients  that  are  not  yet  eliminated.  Thus,  the  equations 
are  not  assembled  and  eliminated  sequentially.  New  equations  take  up  the 
positions  of  equations  which  have  ceased  to  be  active  and  the  front  matrix 
contains  only  a  small  number  of  zero  coefficients.  When  employed  in  a  finite 
element  context,  the  efficiency  of  a  frontal  scheme  is  dependent  upon  the 
ordering  of  the  elements,  with  the  ordering  of  the  nodes  being  immaterial 
since  the  equations  are  assembled  on  an  element-by-element  basis.  A  detailed 
discussion  of  the  frontal  approach  is  given  by  Irons  and  Ahmad  (1980).  Sloan 
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and  Randolph  (1983)  show  that  an  upper  bound  on  the  number  of  operations  Is 
given  by 


N  2 

T  =  |  (W  +  W  -  2) 


(35) 


where  W  is  the  maximum  front  width  and  the  rank  of  the  coefficient  matrix  is 

N.  By  comparison,  Gaussian  elimination  operating  directly  on  [A]  involves  a 

3 

total  number  of  operations  proportional  to  N  . 


Structured  vs  Unstructured  Grids 


41.  The  solution  cost  can  be  directly  related  to  the  type  of  grid 
employed.  A  mesh  or  grid  is  said  to  be  structured  or  regular  if  the  geometric 
domain  is  divided  into  q  rows  with  p  elements  per  row.  With  regular 
grids,  the  unknowns  are  ordered  such  that  the  nonzero  entries  of  the  coeffi¬ 
cient  matrix  lie  in  a  predictable  pattern.  A  completely  unstructured  grid 
implies  that  the  non-zero  entries  are  scattered  rather  haphazardly  throughout 
the  matrix.  Between  these  two  types  of  grids  are  semi  regular  grids.  With 
semi-regular  grids  the  domain  is  divided  into  rows  and  columns  but  there  may 
be  a  variable  number  of  elements  per  row.  Figure  1  illustrates  the  three 
types  of  grids. 

42.  Usually,  the  more  flexible  and  general  a  model  is  the  greater  are 
its  solution  costs.  Therefore,  allowing  for  completely  unstructured  grids  in 
the  RMA  codes  significantly  increases  the  matrix  solution  cost.  The  factors 
which  most  affect  matrix  solution  costs  are  total  arithmetic  operations  re¬ 
quired,  storage  requirements  and  overhead  due  to  data  transmission  and  to 
logical  operations  associated  * ith  the  solution  method.  The  relative  impor¬ 
tance  is  dependent  upon  the  computer  being  used  and  the  size  of  the  problem 
being  solved. 

43.  Many  efficient  direct  methods  exist,  e.g.  the  frontal  solution 
scheme  employed  in  the  RMA  codes;  however,  depending  upon  the  problem  an 
iterative  method  can  be  more  efficient.  Among  the  factors  to  be  considered 
are  computer  storage  requirements  and  the  number  of  arithmetic  operations  re¬ 
quired.  Hageman  and  Young  (1981)  state  that  for  many  problems  there  is  a 
cross-over  point  in  the  number  of  unknowns,  above  which  a  good  iterative 
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method  becomes  more  cost  effective  than  a  good  direct  method.  In  particular, 
they  indicate  that  for  unstructured  grids  the  block  CCSI  (cyclic  Chebyshev 
semi-iterative)  scheme,  used  with  an  algorithm  to  minimize  bandwidth,  is 
worthy  of  consideration.  A  recent  paper  by  Mavriplis  and  Jameson  (1987) 
demonstrates  that  the  efficiency  of  a  multigrid  solution  on  unstructured  tri¬ 
angular  meshes  employing  Jacobi  iteration  is  competitive  with  available  struc¬ 
tured  mesh  Euler  solvers.  Therefore,  one  measure  worth  considering  for 
implementation  in  codes  such  as  the  RMA  codes  is  an  iterative  solution  scheme. 
A  more  permanent  solution  to  the  reduction  of  linear  algebra  costs  is  to  re¬ 
duce  the  generality  of  the  model  to  allow  at  most  only  semistructured  grids 
such  that  a  tensor  product  factorization  of  the  coefficient  matrix  is  pos¬ 
sible.  Through  the  use  of  grids  composed  of  blocks,  significant  generality  is 
retained.  With  such  an  approach  the  finite  element  method  becomes  a  viable 
solution  method  for  computing  long  term  hydrodynamics.  This  approach  is  dis¬ 
cussed  in  PART  IV. 

Reduction  of  the  Size  of  the  Coefficient  Matrix 

44.  The  size,  i.e.,  bandwidth  of  the  coefficient  matrix  in  the  linear 
algebra  problem  is  dependent  upon  the  number  of  coupled  partial  differential 
equations  to  be  solved,  the  dimension  of  the  problem,  the  number  and  ordering 
of  the  finite  elements  and  the  degree  of  the  basis  function,  i.e.,  the  approx¬ 
imating  polynomial.  Thus,  in  order  to  reduce  the  size  of  the  matrix  one  or 
more  of  the  above  factors  must  be  addressed.  Since  the  problem  dimension  and 
the  number  of  finite  elements  employed  are  problem  dependent  these  are  not 
considered.  However,  the  coupling  of  equations  and  the  completeness  of  the 
basis  function  are  certainly  areas  in  which  modifications  should  be 
considered. 

Reduction  of  matrix  rank 

45.  Typical  finite  element  hydrodynamic  codes,  e.g.  the  RMA  models,  are 
completely  implicit  in  that  all  variables  are  coupled.  For  example,  in  the  3D 
RMA-10  model  the  coupled  variables  become  the  water  surface,  three  velocity 
components  and  salinity.  One  approach  taken  in  numerical  hydrodynamic  models 
that  employ  the  finite  difference  method  is  to  compute  the  water  surface 
implicitly  through  the  derivation  of  a  frictionally  damped  wave  equation  and 
to  then  use  the  implicitly  computed  water  surface  in  explicit  computations  for 


the  velocity  components,  which  are  then  employed  in  computations  for  the  sa¬ 
linity,  i.e.,  the  equations  are  uncoupled.  With  this  approach,  restrictions 
on  the  allowable  time  step  will  occur;  however,  the  stability  criteria  are 
primarily  related  to  the  speed  of  a  water  particle  rather  than  the  speed  of  a 
free  surface  wave,  which  can  often  be  quite  restrictive. 

46.  With  this  approach  employed  in  RMA-10,  five  matrix  equations  would 
need  to  be  solved  but  each  matrix  would  only  be  1/5  the  size  of  the  matrix 
arising  from  the  coupled  case.  Since  the  arithmetic  operations  required  for 
solution  are  proportional  to  the  rank  of  the  matrix  times  the  square  of  the 
bandwidth,  obviously  the  solution  of  five  small  problems  will  be  much  less 
expensive  than  that  of  the  large  problem.  This  should  be  true  even  though  the 
computational  time  step  allowed  will  be  smaller  than  that  for  the  completely 
implicit  case. 

Bandwidth  reduction 

47.  The  second  way  in  which  the  size  of  the  coefficient  matrix  can  be 
reduced  is  related  to  the  degree  of  the  approximating  polynomial.  The  greater 
the  degree  of  the  polynomial  the  denser  the  coefficient  matrix  becomes  since 
the  higher  degree  elements  connect  nodes  over  a  larger  stencil.  Thus,  the 
bankwidth  is  significantly  larger.  Baker  (1983,  Ch.  3)  compares  the  computa¬ 
tional  requirements  for  a  potential  flow  solution  using  linear  and  quadratic 
approximating  polynomials.  A  discretization  containing  624  nodes  was  solved 
in  both  cases.  The  quadratic  solution  required  approximately  twice  the  com¬ 
puter  storage  and  40  times  the  CPU.  The  two  solutions  were  comparable  in 
accuracy. 

48.  In  the  RMA  codes,  the  water  surface  is  approximated  with  a  linear 
basis  function;  whereas,  the  velocity  components  are  approximated  by  a  qua¬ 


dratic.  For  a  non-free  surface  incompressible  fluid,  the  velocity  is  required 
to  be  approximated  by  a  polynomial  of  one  degree  greater  than  the  pressure. 
However,  free  surface  hydrodynamic  computations  are  analogous  to  compressible 
flow  computations  in  aerodynamics.  For  example,  hydraulic  jumps  develop  when 
the  water  velocity  become  supercritical  while  shock  waves  develop  in  air  when 
the  Mach  number  exceeds  one.  Finite  element  researchers  in  the  aerodynamics 
field  routinely  employ  the  same  degree  basis  function  for  the  velocity  and 
pressure.  Therefore,  one  possible  means  for  reducing  the  computational  costs 
of  codes  such  as  the  RMA  finite  element  models  would  be  to  employ  a  linear 
basis  function  for  the  velocity  components  also.  Naturally  this  will  reduce 
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the  theoretical  accuracy  of  the  computations  for  a  fixed  mesh  size.  However, 
as  illustrated  by  the  potential  flow  solution  example  above,  this  may  be  a 
preferred  alternative.  One  point  to  consider  is  that  rather  large  eddy  dif¬ 
fusion  coefficients  must  often  be  employed  in  models,  such  as  the  RMA  codes, 
that  employ  the  standard  Galerkin  finite  element  method  in  order  to  dampen 
dispersion  error  waves  that  are  created.  The  use  of  eddy  coefficients  multi¬ 
plied  by  second  derivatives,  however,  does  not  provide  selective  damping. 
Therefore,  the  flow  solution  of  interest  is  also  damped  with  the  net  effect 
being  that  a  lower  order  approximation  with  built  in  numerical  diffusion  might 
as  well  have  been  used.  A  discussion  of  finite  element  solutions  based  on  the 
Taylor  weak  statement,  which  provides  selective  numerical  damping  (dampens 
primarily  the  short  wave  lengths)  is  presented  in  PART  IV. 

Static  condensation 

49.  One  additional  method  for  reducing  the  size  of  the  coefficient 
matrix  is  referred  to  as  static  condensation.  This  technique  can  only  be 
applied  to  linear  equations  using  higher  degree  basis  functions.  The  proce¬ 
dure  involves  reducing  out  the  interior  nodes  of  a  higher  degree  element  from 
the  problem.  The  effect  is  an  overall  reduction  in  the  rank  of  the  coeffi¬ 
cient  matrix  and  thus  substantial  potential  savings  in  computer  costs.  Of 
course,  due  to  the  nonlinearity  of  the  hydrodynamic  equations  static  conden¬ 
sation  can  not  be  applied  unless  the  equations  are  linearized.  However,  this 
is  often  done  in  finite  difference  models  where  the  nonlinear  convective 

<)u 

terms,  e.g.  u  ^  ,  are  handled  by  taking  the  u  at  the  previous  time  step 

or  perhaps,  as  in  the  uncoupled  frictionally  damped  wave  equation  approach, 
the  complete  term  is  lagged  at  the  previous  time  step.  This  effectively 
linearizes  the  equations  at  a  time  step  and  thus  static  condensation  could  be 
employed  if  higher  degree  basis  functions  are  assumed. 

50.  One  additional  comment  concerning  the  lagging  of  the  nonlinear 
terms  at  the  previous  time  step  should  be  made.  With  such  an  approach,  the 
matrix  equation  is  solved  only  once  per  time  step  as  opposed  to  the  iterative 
Newton  Raphson  scheme  employed  in  models  such  as  the  RMA  codes  to  solve  sys¬ 
tems  of  nonlinear  algebraic  equations.  The  RMA  codes  commonly  require  2-3 
iterations  per  time  step.  Lagging  the  nonlinear  terms  will  immediately  reduce 
the  computational  cost  by  a  significant  factor,  even  if  nothing  else  is  done. 
Lagging  the  nonlinear  terms  should  not  significantly  influence  the  solutions 
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generated  by  codes  such  as  the  RMA  models  since  they  are  normally  applied  for 
the  computation  of  gradually  varying  phenomena,  e.g.  tidal  circulation  in 
estuaries.  This  would  not  be  the  case  if  the  models  were  intended  to  compute 
rapidly  varying  surges  or  perhaps  hydraulic  jumps,  i.e.,  highly  nonlinear 
phenomena. 

Reduction  of  Costs  in  Problem  Setup 

51.  In  the  finite  element  method,  integrals  involving  the  basis  func¬ 
tions  and  derivatives  of  the  functions  arise.  If  the  finite  elements  are  re¬ 
stricted  to  be  triangles  in  two  dimensions  (N  =  2)  and  tetrahedrons  in  three 
dimensions  (N  =  3) ,  all  integrals  can  be  analytically  evaluated  through  the 
use  of  the  expression  below 
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for  linear  as  well  as  quadratic  basis  functions.  However,  if  quadrilaterals 
and/or  quadrahedrons  are  employed  the  resulting  integrals  can  not  in  general 
be  evaluated  analytically.  Therefore,  the  integrals  must  be  evaluated  numeri¬ 
cally  with  Gaussian  quadrature  commonly  employed.  This  of  course  increases 
the  computational  costs  involved  in  forming  the  matrix  equation,  with  these 

costs  being  directly  related  to  the  order  of  the  quadrature  employed.  Since 

2k 

the  quadrature  error  is  0(h)  ,  where  k  is  the  order  of  the  quadrature 

formula  and  h  is  a  measure  of  the  mesh,  using  k=2  should  be  quite  suffi¬ 
cient  for  models  employing  quadratic  basis  functions.  Reducing  the  order  of 
the  numerical  quadrature  in  finite  element  models  to  be  no  greater  than  the 
order  of  the  basis  function  employed  will  reduce  problem  setup  costs. 

Implicit  vs  Explicit  Time  Integration 


52.  The  finite  element  algorithm  applied  to  time  dependent  partial  dif¬ 
ferential  equations  yields  a  large  order  system  of  ordinary  differential  equa¬ 
tions.  In  other  words,  the  matrix  equation  involves  unknowns  that  are  time 
derivatives  of  the  nodal  values  of  the  dependent  variables.  Any  integration 


formula  can  be  applied  to  transform  the  system  of  ordinary  differential  equa¬ 
tions  to  an  algebraic  system. 

53.  The  general  form  of  the  system  of  ordinary  differential  equations 
becomes 

[C]  d{Q(t) }  +  [D]  (Q(t) }  =  {B}  (37) 

dt 


Employing  a  one  step  time  integration  algorithm  yields 


,n+l 


[c]  +  ate  [D]  {q}  +  (at(i-e)fD]  -  [c]  {q}“  =  at  {b} 
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(38) 


where  <>  controls  the  implicitness,  e.g.,  6=1  yields  a  completely  implicit 

scheme  while  6=0  results  in  an  explicit  time  integration  scheme,  i.e.,  the 
forward  Euler  integration  scheme.  An  interesting  observation  is  that  even  if 
6=0  ,  i.e.,  the  time  integration  is  explicit,  a  system  of  algebraic  equations 
remain  to  be  solved  in  an  implicit  fashion  since  the  matrix  [C]  is  not  a  diag¬ 
onal  matrix.  This,  of  course,  is  unlike  the  finite  difference  method  which 
yields  (by  delinition)  a  diagonal  recursive  relationship  for  the  unknowns  when 
an  explicit  time  integration  scheme  is  employed. 

54.  Based  upon  the  the  fact  that  [C]  is  not  diagonal,  it  would  appear 
foolish  to  employ  an  explicit  time  integration  scheme  with  the  finite  element 
method  since  essentially  the  same  linear  algebra  problem  remains,  but  with  the 
computational  time  step  restricted  by  stability  criteria.  One  solution  is  to 
diagonalize  [C].  As  discussed  by  Baker  (1983,  Ch.  4),  there  are  two  common 
approaches  taken.  One  method  is  row  wise  summation,  i.e.. 
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whereas,  the  second  is  normalization  of  the  diagonal  elements  of  [C]  with  the 
off  diagonal  terms  then  set  to  zero.  The  non-diagonalized  matrix  is  referred 
to  as  a  consistent  matrix,  whereas,  the  diagonalized  form  is  referred  to  as  a 
condensed  matrix. 

55.  Baker  (1983,  Ch.  4)  presents  solutions  for  the  one  dimensional 
diffusion  equation  (without  convection)  for  using  both  the  consistent  and 


condensed  forms  on  the  same  grid.  The  grid  used  was  rather  coarse,  and  dis¬ 
persion  error  "wiggles"  developed  in  the  consistent  solution.  However,  the 
wiggles  disappeared  in  the  condensed  solution,  confirming  that  the  condensing 
operation  yielded  an  artificial  smoothing  of  the  solution.  It  might  be  noted 
that  the  maximum  allowable  time  step  for  explicit  integration  of  the  condensed 
form  was  about  twice  that  of  the  consistent  form. 

56.  To  further  illustrate  that  finite  difference  recursion  relations 
can  be  derived  from  the  weighted  residual  method  used  for  generating  the 
finite  element  method.  Baker  (1983,  Ch.  4)  shows  that  for  a  convection- 
diffusion  problem  in  one  dimension,  with  a  linear  basis  function,  the  finite 
element  method,  employing  the  trapezoidal  rule  time  integration  algorithm, 
yields  what  is  referred  to  as  the  Chapeau  finite  difference  algorithm  on  a 
uniform  discretization.  If  the  trapezoidal  rule  integration  is  employed,  but 
the  coefficient  matrix  multiplying  the  vector  containing  the  unknown  deriva¬ 
tives  of  the  nodal  unknowns  is  diagonalized,  the  resultant  recursion  relation 
becomes  identical  to  the  Crank-Nicolson  finite  difference  scheme. 

57.  It  appears  that  if  the  hydrodynamics  of  a  water  body  are  to  he  com¬ 
puted  on  an  unstructured  grid,  an  explicit  time  integration  scheme  with  a  con¬ 
densed  matrix  should  be  considered  for  implementation.  Even  though  the  time 
step  will  be  restricted,  the  linear  algebra  problem  becomes  a  trivial  opera¬ 
tion  and  the  computational  costs  may  well  be  significantly  less.  This  will 
especially  be  true  if  the  water  surface  is  computed  implicitly  with  all  other 
variables  computed  explicitly  using  the  condensed  form,  since  the  most  severe 
stability  criterion,  i.e.,  the  free  surface  wave  speed,  will  be  removed  from 
the  stability  criteria. 
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PART  IV:  LONG  TERM  (PERMANENT)  SOLUTION  APPROACH 

58.  The  discussion  up  to  this  point  has  focused  on  ideas  for  reducing 
computational  costs  that  can  be  implemented  in  a  relatively  short  period  of 
time.  A  longer  term  research  effort  following  the  approach  outlined  below 
will  yield  not  only  a  computationally  efficient  finite  element  model  but  one 
with  excellent  stability  properties.  Much  of  the  suggested  approach  is  a  re¬ 
sult  of  a  study  of  the  fairly  typical  finite  element  hydrodynamic  code  RMA-10 
which  was  developed  by  King  (1982). 

Problem  Statement 

59.  The  Reynolds-averaged  Navier-Stokes  equations,  Cebeci  and  Smith 
(1974),  are  considered  appropriate  for  hydrodynamic  three-dimensional  flows. 
The  form  usually  considered  as  baseline  for  hydrodynamics  analyses  is  given 
by  King  (1982)  as, 
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where  x  =  (x,v,z}  is  the  cartesian  coordinate  system,  u  =  iu.v.w}  is  the 
i  i 

corresponding  velocity  field  expressed  in  cartesian  components,  g  is  grav¬ 
ity,  p  is  pressure,  and  P  is  the  water  density.  Further,  ,  f  , 

etc.  are  the  turbulent  eddy  viscosity  coefficients,  and  t*  ,  t*  ,  t*  are 
the  surface  tractions  operating  on  the  boundaries  only.  Figure  2  schemati¬ 
cally  represents  the  water  body  and  conventional  directions  for  the  axes  and 
velocities.  For  the  RMA-10  analysis,  it  is  assumed  that  the  vertical  momentum 
equation  (42)  may  be  reduced  to  the  form, 


pg  +  If  =  0 


i.e.,  the  pressure  in  the  vertical  (z)  direction  is  hydrostatic. 

60.  RMA-10  handles  both  homogeneous  and  stratified  flow  problem  de¬ 
scriptions.  Taking  the  general  case,  the  assumption  of  Equation  44  yields  the 
pressure  gradient  expressions 
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For  the  homogeneous  flow  assumption,  Equations  45-46  then  simplify  to, 
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where  the  definitions  h(x,y,t)  and  a(x,y)  are  noted  in  Figure  2.  The  RMA-10 
equation  set  is  completed  with  the  inclusion  of  an  advection-dif fusion  equa¬ 
tion  for  transport  of  scalar  fields,  eg.,  temperature,  salinity,  sediment,  in 
the  form  (with  temperature  as  the  example) , 
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where  V  =  (u  +  v“)  ,  the  surface  velocity  magnitude  and  C  is  the  Chezy 

coefficient.  These  stresses  are  not  constant  and  thus  are  incorporated  into 
the  nonlinear  solution  dependent  upon  the  local  velocities  (only  over 
surfaces) . 

61.  As  a  final  formulational  step,  the  hydrostatic  pressure  assumption 
deletes  the  remaining  terms  in  Equation  42  as  higher  order.  Thus,  the  con¬ 
tinuity  equation  43  becomes  that  governing  the  vertical  velocity  component  w. 
Integrating  over  z  yields 

/a+h  y  3  fa+h  . 

a  +  3y^  dz  =  a  af  dz  =  -w(x,y,a+h)  +w(x,y,a)  (52) 

At  the  top  surface, 


w(x,y,a+h) 


While  at  the  bottom. 


r  .l\  3 (a+h)  ,  ,  ,LX  3 (a+h)  .  ?h 

u(x,y,a+h)  — -  +  v(x,y,a+h)  — 


w(x,y,a)  *  u(x,y,a)  +  v(x,y,a) 


Combining  Equation  52-54  yields, 


a+h  ^  a+h 

Ja  UdZ  +  ^Ja  V 


dz  +  =  0 


Thus,  with  Equation  55  the  vertical  velocity  appears  only  in  the  lateral  (x,y) 
momentum  equations.  RMA-10  treates  only  a  minor  coupling,  computing  w  sepa¬ 
rately  from  the  basic  continuity  Equation  43.  The  pressure  p  can  be  elimi¬ 
nated  via  Equations  47-48  and  replaced  by  depth  h  which  is  a  function  only 
of  x  ,  y  and  t  ,  as  governed  by  Equation  55.  Finally,  Equation  49 


►•--'V* ,  v„%v \vx\-.- 

& 

completes  the  set  for  scalar  field  convection-diffusion  along  with  p=p(S,T). 

62.  RMA-10  code  implementation  of  a  finite  element  algorithm  for  solu¬ 
tion  of  Equations  40,  41,  43,  49  and  55  utilizes  a  coordinate  transformation 
to  a  "regularized"  solution  domain,  which  in  itself  creates  potential  numeri¬ 
cal  instabilities.  Contemplating  the  need  to  alter  this,  the  first  step  to 
long  term  generalization  would  be  to  re-express  the  governing  hydrodynamic 
equation  system  in  the  preferred  form  of  a  (hyperbolic)  conservation  law 
statement.  The  computational  aerodynamics  literature  has  become  filled  with 
the  observation  that  this  form  is  preferred  for  a  variety  of  reasons  including 
truncation  error  control,  stability  and  exact  satisfaction  of  sharp  solution 
gradient  (approximations).  Thus,  without  loss  of  any  generality,  the  first 
step  would  be  to  complete  the  details  of  re-expression  of  Equations  40-43  and 
49  in  the  form. 


L(P)  =  -i-(pu,)  =  +  Ifil  +  2fi2  =  o 

w  3x.  w  i  3x  3y  3z 


3(PU1)  a  /  \ 

L(pui)  =  at  +  aTT  ( ujpui  +  p5ij  "  °ij]  =  0 


L(o^)  =  (uiP<t,  -  d.y  =  0 


L(h)  =  !r  +  db  v  +  f(a’h)  +sh  =  0  (5' 

J 

Equation  56  expands  the  summation  index  convention,  the  variables  have  the 
same  interpretation,  and  i  represents  any  scalar  field. 

63.  The  generalized  stress  tensor  o  is  constituted  of  its  laminar 
and  Reynolds  averaged  contributions, 


°lj  ’  “Elj  -  »ui“j 


and  the  integral  of  E,.  =  3u./3x.  +  3u./3x,  yields  the  surface  stresses 

1 1  i  1  .1  i  _ _ 

explicitlv  expressed  in  Equations  40-42.  The  Reynolds  stress  u^uj  *  as 


V 

C 


i 


as  the  diffusion  flux  in  Equation  58,  would  be  expressed  in  a  gradient- 

diffusion  law  statement,  i.e., 


t  / 3u  3u,  \ 

i!u'.  =  v,.  (~  +  ~ 1  )  ,  d.  =  -  k.  . 
i  j  jk\3x  3xi  /  3  lj  3xi 


where  is  a  turbulent  eddy  viscosity  tensor  and  k_^  is  the  correspond¬ 

ing  diffusion  tensor  for  <5  .  The  terms  s.  and  s,  are  the  corresponding 

P  h 

source  terms,  and  f(a,h)  is  a  functional  form  to  be  established  that  contains 
the  differentiability  required  in  conversion  of  Equation  55  to  59.  Once  this 
is  completed,  the  3D  hydrostatic  form  of  the  governing  equation  system  can  be 
compactly  stated  as. 


L(<!)  ’  ft  +  S*  +  s  '  0 
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where  the  dependent  variable  set  is  q  =  (pw,  pu,  pv,  pj5,  h) .  The  exact  forms 
for  the  corresponding  flux  vectors  f^  ,  and  source  term  s  would  be  derived. 
Equation  62  expresses  the  governing  equation  system  in  the  preferred  conserva¬ 
tion  law  statement  form. 


Transformation  for  Variable  h 

64.  The  RMA-10  computer  program  employs  a  coordinate  transformation  to 
regularize  the  solution  domain  and  eliminate  the  explicit  appearance  of  a 
time-varying  upper  surface  (h)  location.  Figure  3  illustrates  the  transfor¬ 
mation,  which  is  defined  by  King  (1982)  as 


h(x,y,t)z'  =  z  -  a(x,y)  ,  x' 


y'  =  y 


Thus , 


-(x,y,z(z’)> 
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(x,y , z (z ' ) )  = 
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— (x  V  z(z'))  =  3-U-'-(x’  V*  z')  -  —  (— z')  ^u' 
3t(x,y,zu  ))  3t  (x  ,y  ,z  )  h  (^z  ) 


(67) 


where  ur  =>  u' (x' ,y' ,z'h+a) .  Similar  expressions  are  generated  for  v'  and  w' , 
which  when  substituted  into  the  vertically  integrated  continuity  equation  55 
yields , 


a)  Domain  Grid 

In  Physical  Space 


b)  Transformed 
Solution  Domain 


Figure  3.  TABS-3  Solution  Domain  Transformation, 
From  King  (1982,  Figure  2.2) 
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65.  After  integrating  and  proceeding  through  the  remaining  equations, 
expressing  all  variables  in  the  primal  system  then  yields  the  final  3D  govern¬ 
ing  equation  system  in  RMA-10  as 


35 


t 


J0l( h  (hu,)  +  h  (hv,))dz' +  It  = 0 


(69) 
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(72) 


.  3T  ,3T  .  t3T  .  ,3T  ,  ,  .  ,  ,  ,  3h.3T 

h3 1  +  hu  ^rr  +  hv  w  +  w  —  -  <u  sv»  +  v  S„1  +  z  ^r)^TT 


3x 


3y '  3z 


3t  3z ’ 


(73) 


,  3  rn  3T 
"  h3x,(Dx'3x,)  " 


.  3  3T  .  3  3T  . 

h3yT<Dy,8y,)  ”  az'^a'az’5  ~  h0s  " 


Equations  70-73  are  completed  with  the  state  equation  p=p(S,T)  ,  and  after 
transformation, 


P 


(74) 


The  prime  notation  on  the  eddy  diffusion  coefficients  indicates  their  approxi¬ 
mate  similarity  to  the  original  coefficients. 

66.  King  (1982)  notes  that  a  price  is  paid  in  terms  of  mathematical 
consistency  for  the  domain  coordinate  transformation.  Specifically,  parallel 
flow  at  the  top  or  bottom  of  the  new  system  transforms  to  flow  parallel  to  the 
surface  in  the  original  system.  Thus,  at  slope  changes  in  the  bottom  profile 
there  is  a  discontinuity  in  the  flow  trajectory  and  in  the  magnitude  of  the 
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vertical  component.  For  example,  (see  Figure  3a),  the  vertical  velocity  is 
zero  in  the  left-hand  element  and  finite  in  the  right-hand  element.  Thus,  to 
record  velocities  at  nodes  for  output  requires  averages  be  taken  and  slight 
inconsistencies  are  generated.  Vertical  steps  (or  sharp  changes)  in  the  bot¬ 
tom  profile  can  only  be  treated  by  approximate  boundary  condition  specifica¬ 
tions,  and  in  general  the  approach  is  unable  to  accurately  describe  such  a 
system.  Flow  separation  is  not  restricted,  however,  and  the  coordinate  trans¬ 
formation  does  ensure  global  satisfaction  of  continuity. 

67.  Definition  and  use  of  this  coordinate  transformation  requires  a 
close  examination,  to  be  conducted  in  this  second  step  analysis,  since  realis¬ 
tic  bed  profiles  containing  fairly  sharp  changes  in  a(x,y)  will  destabilize  a 
RMA-10  computation.  (Certainly  to  proceed  to  a  vectorization  with  this  in¬ 
trinsic  limitation  does  not  make  sense  in  the  longer  term.)  The  approach 
would  be  to  examine  the  generalized  (body-fitted)  coordinate  transformation 
concepts  employed  throughout  aerodynamics  (see  Thompson,  et  al,  1985)  to  the 
developed  conservation  law  statement  form  ie..  Equation  62.  The  functional 

form  is  essentially  identical  to  Equation  63,  ie.,  x'  =  x'  (x.,t)  ,  where 

—  —  J 

the  time-dependence  is  contained  in  h=h(x,y,t).  The  basic  step  is  to  re¬ 
express  the  derivative  operators  in  the  conservation  law  statement  into  the 
primed  coordinate  system.  Quite  simply,  this  is  no  more  than 


where  the  bracket  with  subscript  notation  implies  derivatives  with  the  cor¬ 
responding  variable  held  fixed.  That  is,  the  left  side  of  Equation  75  is  the 
time  derivative  in  transformed  space  where  the  surface  h(x^)  appears  fixed 
(in  time).  The  time  derivative  on  the  right  is  as  seen  in  physical  space,  and 


in  the  third  term  (3x  /3t) '  =  x.  is  the  (grid)  speed  of  the  physical  coor- 

3  xl  3 

dinate  system  (nodes)  as  seen  in  the  fixed  (x!)  system.  Hence,  for 


Equation  62 


i  < 


3x '  I  ,  3x . 
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Re-ex^ressing  the  second  derivative  in  the  first  line  of  Equation  76  in  terms 
of  x?  ,  the  accounting  for  the  variable  lid  h=h(x,y,t)  is  then  intrinsi¬ 
cally  embedded  in  the  conservation  law  statement  when  rewritten  as, 
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68.  The  specific  form  for  Equation  77  must  be  derived  for  arbitrary 
(within  the  validity  of  the  hydrostatic  assumption)  surface  forms  h=h(x,  t)  . 
The  net  output  will  be  the  specific  form  for  the  grid  velocity  x^  ;  whereupon 
Equations  77  and  70-73  can  be  exactly  compared  term  by  term.  Of  importance, 
no  a  priori  constraint  is  implied  on  variation  of  bed  profile  a(x,y)  .  Once 
established,  the  form  of  Equation  77  is  ideally  suited  to  the  next  two  steps. 

Body  Fitted  Coordinate  Transformations 

69.  This  item  is  the  significant  third  step  upon  completion  of  step  2. 
Finite  elements  have  always  been  viewed  for  their  geometric  versatility,  ie., 
one  can  readily  triangularize  (or  discretize  into  unions  of  tetrahedra,  pen- 
tahedra  and  hexahedra  in  3D)  any  region  with  geometrically-complicated  bound¬ 
ary.  Figure  3  gives  an  elementary  example.  However,  especially  for  a  moving 
boundary,  and  in  general,  highly  structured  coding  procedures  are  one  natural 
consequence  of  embedding  a  generalized  coordinate  transformation  to  the  con¬ 
servation  law  statement  before  writing  the  finite  element  algorithm  statement. 
Equation  77,  as  the  successor  to  eqn  62,  is  already  written  in  the  desired 
form  which  is  therefore  immediately  applicable  for  use  with  any  (body-fitted) 
coordinate  transformation.  For  example,  the  finite  element  algorithm  state¬ 
ment  for  Equation  77  would  be  immediately  applicable  to  analysis  of  Charleston 
Harbor,  Figure  4a,  or  to  flow  past  an  airfoil.  Figure  4b.  Conversely,  it 
would  also  be  immediately  extensible  to  use  with  adaptive  mesh  procedures  to 
place  finer  griddings  in  regions  of  sharp  solution  gradients.  Figure  4c. 


rf.Vt, 


70.  The  formulational  details  of  the  body-fitted  coordinate  transforma¬ 
tion  implementation  must  be  completed.  The  Galerkin  weak-statement  finite 
element  algorithm  (the  theoretical  basis  for  RMA-10)  form  of  Equation  77  is 
well  known.  Specifically,  q  is  approximated  by  a  projection  onto  functions 


generated  from  a  suitable  space  of  trial  functions  S^cHj  yielding, 


q  =  qh(x.,t)  =  £tj>  (x  )  Q  (t) 

3  3  3 


or  conversely,  expressed  in  terms  of  the  cardinal  basis  (N  (n.)}  over  the 

k  j 


union  of  elements  of  the  discretization. 


qh  =  U  {Nk(n  )}T  (Q(t)} 


71.  Figure  5  shows  the  elements  available  in  the  RMA-10  program,  for 


which  the  cardinal  bases  (N  (n.)}  are  well  known  (and  in  which  n.  is  a  local 

11  3  j 


coordinate  system).  The  Galerkin  weak  statement  for  Equation  77  is  then. 


,  H\}  Mqh) 
J  n  cRn 


dx  =  {0} 


where  U  is  the  generic  finite  element  domain,  recall  Figure  5.  For  k=2  , 
6 


the  quadratic  basis  is  defined,  as  is  utilized  in  RMA-10  for  the  two  trans¬ 
verse  momentum  equation,  while  k=l  defines  the  linear  basis  as  used  by 
RMA-10  for  the  integral  continuity  equation.  Neglecting  for  the  moment  the 
grid  velocity  term,  and  using  for  example  the  scalar  convection-dif fusion 
equation  (58),  and  anticipating  the  embedding  of  h,  the  expanded  form  of 
Equation  80  is. 
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72.  In  Equation  81,  the  superscript  "h"  on  [  ]  denotes  use  of  Equa¬ 
tion  79,  and  Sg  is  the  familiar  finite  element  assembly  operator.  Using  the 
Green-Gauss  form  of  the  divergence  theorm  in  Equation  81  to  transport  the 
functional  support  for  3/3x!  yields  the  flux  vector  term  as, 


e 


73.  The  basic  form  of  the  diffusion  term  is  immediately  recognized, 

where  k.  is  the  directional  coefficient.  One  can  recognize  that  3(N,  }/3x' . 

3  k  j 

is  a  direct  operation,  and  the  key  step  is  to  use  (N^l  to  interpolate  the 

coordinate  transformation  on  the  (each)  element  domain  •  Denoting  the 

(global)  node  coordinates  as  the  arrays  (x)  ,  (y)  and  (z)  ,  ie.,  (Xl)  ,  and 

e  e  e  e 

using  Equation  79,  yields  the  local  coordinate  transformation 


X,  =  {Nf (n.))  T(xil 
i  k  j  e 


(83) 


As  detailed  in  Baker  (1983,  Ch.  5),  the  Jacobian  of  the  forward  transformation 
on  is  readily  evaluated  as  a  rank  n  square  matrix.  The  Jacobian  of  the 
inverse  transformation  can  be  formed  analytically,  thus  yielding  the  form. 


d^r  (ETAJI), 


(84) 


where  the  l  <  (I,J)  <  n  entries  in  the  array  (ETAJI)  involve  distributed 

differences  in  the  entries  in  (Xl)  . 

e 

74.  Noting  that  dx  =  det  J'  dx’ ,  and  denoting  the  (contravariant) 


components  of  the  velocity  field  u^  , 
curves  of  the  x'^  coordinate  system, 
tion  82  for  a  typical  element  becomes 


that  are  parallel  to  the  coordinate 
as  Uj  ,  the  first  term  in  Equa- 


det  J'dx' 


{N)T{RHPHI)  dx' 
e 


(85) 


The  integrals  defined  in  Equations  85  are  evaluable  using  standard  techniques. 

Det  J'  has  cancelled  out.  [UBARJ]  contains  the  nodal  values  of  velocities 

e  < 

parallel  to  the  x'^  coordinate  system  (hence,  parallel  to  the  free  surface 

and  {RHPHI}  contains  the  nodal  values  of  (ph<5)  .  The  diffusion  term  involves 
e  e 

(outer)  products  in  (ETAJI)^  (ETALD^/det  J',  and  its  evaluation  is  straight¬ 
forward.  This  outlines  the  generalized  coordinates  formulation  for 
Equation  77. 

Enhanced  FE  Algorithm  Stability 

i 

75.  The  finite  element  algorithm  upon  which  RMA-10  is  constructed  is  J 

the  Galerkin  weak  statement,  ie.,  the  Galerkin  criteria  for  the  weight  func-  ' 

tions  in  a  weighted  residual  statement,  as  expressed  in  Equation  80.  It  has  i 

i 

become  well  verified  in  the  last  few  years  that  the  Galerkin  formulation  is  1 

very  limited  in  terms  of  stability  for  problem  definitions  with  significant  I 

fluid  convection,  ie.,  large  Reynolds-Peclet  number  fluid-thermal  flows. 

Experience  in  using  the  RMA  codes  at  WES  for  certain  problem  simulations, 
especially  those  with  significant  bed  profiles,  confirms  that  destabilizing 
mechanisms  can  become  evident  and  a  problem.  One  "numerical  cure"  for  this  is  j 

to  increase  the  "eddy  viscosity"  coefficient,  the  essence  of  which  is  to  re¬ 
duce  the  computational  Reynolds  number  to  that  required  for  Galerkin  stabil-  ‘ 

ity.  Unfortunately,  this  severely  compromises  the  ability  to  simulate  flows 
with  realistic  Reynolds  numbers.  One  goal  in  the  reformulation  would  be  to 
moderate  the  stability  perturbations  induced  by  geometrically-signif icant  bed 
profiles  as  well  as  the  dynamic  free  surface.  The  next  step  of  this  reformu-  i 

lation  analysis  is  to  thus  enhance  algorithm  stability  performance  in  a  robust 
and  mathematically  consistent  way. 

76.  The  numerous  artificial  diffusion  (viscosity)  methods  devised  for 
finite  difference  CFD  methods,  as  well  as  essentially  all  of  the  comparison 
finite  element  constructs,  eg.,  Petrov-Galerkin,  Taylor-Galerkin,  character- 
istic-Galerkin,  penalty-Galerkin, . . ,  have  been  firmly  established  as  belonging 
to  the  family  of  Taylor  weak  statements  (TWS)  by  Baker  and  Kim  (1987).  Rather 
than  arbitrarily  adding  diffusion-type  terms  to  the  algorithm  statement,  Equa¬ 
tion  80,  the  TWS  procedure  generates  appropriate  functional  expressions  for 
any  specific  conservation  law  statement,  e.g.  Equation  77.  Realizing  that  the 
coordinate  transformation  is  a  formulational  detail,  neglecting  the  associated 
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notat ional  complication,  expand  the  parent  expression  (Equation  62)  in  a 
Taylor  series. 
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where  subscript  "t"  denotes  order  of  temporal  derivative  at  t  and 

Cn+1  =  Cn  +  '  Equation  62  allows  restatement  of  the  time  derivatives  in 

Equation  86,  neglecting  the  source  term  s,  as 
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The  matrix  A  is  the  Jacobian  of  the  flux  vector  f  ,  the  coefficients  a  , 

8  ,  y  ,  and  p  are  arbitrary,  to  within  a  convex  sum  constraint,  and  their 
specification  ultimately  reflects  choices  to  be  made  for  the  final  functional 
form.  Combining  Equations  86-89  and  collecting  terms  yields. 


n+1  n 


At 


^  =  -fx +  r(“ Aq* +  *  aO  +  * 


n  At2 


x  /  6 


^  (a2)x+a2  yqt 
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(90) 


+  ... 


The  left  side  of  Equation  90  expresses  the  discrete  approximation  to  qf  . 
Assuming  the  limit  is  taken.  Equation  90  can  then  be  re-expressed  as  At  -*•  0 
without  temporal  approximation  (but  retaining  the  higher  order  terms)  as 


Lw(q}  =  qt  '  r  (“  A  ^  Y  l^)*  +  ^  * 


+  fx  -  f  A  0  -  ^<A\  +  a2  fc)fx  -  ••• 
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77.  Equation  91  is  the  conservation  law  form  desired  for  constructing 
the  finite  element  Taylor  weak  statement  algorithm  wherein  At  will  not 
vanish.  Importantly,  Equation  91  is  identically  satisfied  by  the  exact  solu¬ 
tion  to  Equation  62  for  a  =  6  and  y  =  p  upon  reintroduction  of  the  source 
term. 

78.  Constructing  the  finite  element  weak  statement  for  Equation  91 

follows  the  standard  Galerkin  finite  element  procedure.  The  approximation 
statements.  Equation  78-79  are  unchanged,  and  in  Equation  80,  L(q^)  is  re¬ 

placed  by  L  (q  )  .  Letting  {•}  denote  the  global  array  of  (Galerkin) 

W  tl 

weighting  functions  {N},  replacing  f^  everywhere  by  Aq^  ,  and  reintroducing 
the  source  term  s  ,  the  TWS  algorithm  replaces  Equation  80  with  the 
expression. 


m/R  +  6Ahqxjdx 


+  It2  +  uAhqMdx  =  {0} 


I'ca 


The  first  term  in  Equation  92  is  the  classic  Galerkin  statement,  while  over  a 
dozen  (dissipative)  FD  and  PE  algorithms  are  recognized  (solely)  by  the 
designer's  choice  for  the  approximation  (A^1)  to  the  Jacobian  and  the  coeffi¬ 
cient  set  a  ,  8  ,  y  ,  y  in  the  second  term,  cf.,  Baker  and  Kim  (1987). 

79.  Figure  6  summarizes  numerical  results  published  by  Baker  and  Kim 
(1987)  for  sample  ID  hyperbolic  problems  of  unsteady  and  steady-state  solu¬ 
tions  for  linear  and  non-linear  inviscid  test  problems.  The  implicit  Galerkin 
algorithm  sets  the  last  two  terms  in  Equation  92  to  zero  (as  utilized  for 
RMA-10),  and  propagates  the  linear  unsteady  sine  wave  with  little  diffusion 
but  significant  dispersion  error  (trailing  wake).  Figure  6a.  The  steady 
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linear  waLL-layer  solution  for  Pe=I80,  Fig.  6b,  confirms  cascading  . . f  the  dis¬ 
persion  error  into  the  2  Ax  oscillations  intrinsic  to  Galerkin  methods.  In  tin- 
non-linear  stationary  shock.  Figure  6c,  this  basic  instability  cascades  non- 
linearly  to  turn  the  solution  to  trash.  In  distinction,  the  Euler-T.i'l  or- 
Galerkin  (ETC.)  algorithm  of  Donea  (1984),  corresponding  to  a=0  =  {i  and  p-l=y 
in  Equation  92,  does  an  excellent  job  of  controlling  dispersion  error  for  the 
unsteady  linear  travelling  wave,  Figure  6d.  Conversely,  the  penal t y-Galerkin 
(pC.)  algorithm  of  Baker  (1983),  a=l/2=B  and  y=0=p  in  Equation  92  does  an 

excellent  job  for  the  steady-state  linear  wall-layer  and  steady-state  non¬ 
linear  shock.  Figure  6e-f . 

80.  This  step  will  develop  and  derive  the  TVS  finite  element  algorithm 
for  the  hydromechanics  conservation  law  restatements  as  derived  before.  The 
goal  is  to  establish  a  mathematically  robust,  highly  phase-selective  dissipa¬ 
tive  Galerkin  algorithm  and  to  thus  identify  needed  modifications  to  the 
RMA-10  algorithm  (successor)  to  permit  consistent  stabilization  for  simula¬ 
tions  with  large  convection  velocities  and/or  significant  bed  profiles. 

The  Linear  Algebra  Statement  for  Supercomputing 

81.  Step  4  completes  the  detailed  derivation  of  the  TVS  finite  element 
algorithm  form.  Equation  92,  for  recasting,  extension  and/or  total  restatement 
of  the  RMA-10  algorithm.  While  this  involves  great  detail,  upon  completion  of 
the  indicated  integrals,  Equation  92  ultimately  becomes  an  ordinary  differen¬ 
tial  equation  written  for  the  time  evolution  of  the  expansion  coefficient  set 
{Q},  Equation  79,  of  the  form: 


h((Q} )  =  +  (Eh  -  Eh  }  +  { Fh  -  Fh  }  +  {Gh-  Gh}_  +  Sh  -  At (Eh(a , 6 ,A) 

dt  vE  vn  v£  £n 


Fh(a,8,A)  +  {Gn(a,B,A)l  |  -A  t 2 ( { Eh (a , B , A2)  4  ...])  (93) 

r,n  Cn  n 

The  first  line  of  terms  in  Equation  93  constitutes  the  conventional  Galerkin 
weak  statement  for  the  governing  Equation  77,  including  the  grid  speed  and 
source  terms,  and  expressed  in  the  (£>n>4)  transformed  coordinates.  The 
second  and  third  groups  of  terms  constitute  the  Taylor  weak  statement 
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Figure  6.  Comparison  of  Various  Finite  Element  &  Algorithm  Solutions  to 
Hyperbolic  Test  Problems,  from  Baker  and  Kim  (1987) 
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additions  for  embedding  stability  mechanisms  according  to  the  definitions  for 


ot  ,  8  ,  y  ,  a  and  A  . 


82.  The  TVS  finite  element  algorithm  is  ultimately  reduced  to  a  linear 
algebra  statement  for  coding  design  for  a  supercomputer  environment.  For 
notational  simplicity,  functionally  combining  terms  in  Equation  93  yields  the 


ODE  matrix  statement 


M  (a.Y)  ~~  +  (R  ( {Q } ,  8,  U,  (S),  A,  At)}  =  {0} 


The  specific  construction  for  M(*)  and  { R ( *  > }  depends  explicitly  on  the 


choice  of  the  trial  space  basis  ({N^})  and  the  TVS  parameter  choices.  Once 


selected,  Equation  94  permits  evaluation  of  the  time-derivative  of  the  dis¬ 


crete  Taylor  series, 


{Q}  =  (Ql  +  A  t 

n+1  n  dt  n+0 


where  1/2  <  0  _<  1  for  the  required  implicit  algorithm.  Substituting  Equa¬ 
tion  94,  arranging  terms  and  writing  the  resultant  in  homogeneous  form  yields 
the  final  linear  algebra  statement. 


{F}  =  M  {AQ}  +  At  (9{R)n+1  +  (1  -  0)  (Rln 


where  {AQ}  =  {Q} 


{Q}  and  subscript  (n+1,  n)  denotes  evaluation  at  the 


corresponding  time  level.  Linearizing  around  t^  and  dividing  through  by  At 


yields  the  Taylor  weak  statement  computational  linear  algebra  statement  in 
delta  form  as. 


1  M  .  3{R) 

n  m  +  6  aTqT 


{AQ}  =  -  {R} 


83.  The  form  of  Equation  97  is  ideally  suited  to  rapid  determination  of 
the  steady-state  solution,  as  well  as  t ime-accurate  predictions  of  the  tran¬ 
sient.  The  solution  converges  when  {R(*)}  =  {0}  ,  hence  depends  only  on  the 
Taylor  weak  statement  parameters  8,  u ,  and  A  .  The  rate  of  approach  to 


Fqua- 


steady-state  depends  on  a,  6,  Y,  p,  At  and  0,  and  in  the  limit  At  -*  «> 
tion  97  is  a  Newton  algorithm.  The  choice  of  an  implicit  integration  proce¬ 
dure  is  crucial  to  the  algorithm,  and  a  wide  range  of  choices  affecting 
stability  and  convergence  rate  exists  within  the  Taylor  weak  statement. 

8m.  Equation  97  ultimately  resides  in  the  (supercomputer)  code,  and  the 
matrix  iterative  solution  procedure  to  be  selected  is  crucial  to  overall 
efficiency,  hence  utility  of  the  Taylor  weak  statement  finite  element  algo¬ 
rithm.  The  RMA-10  code  now  uses  a  decoupled  full-dimensional  approximation  to 
the  Jacobian  [M/At  +  . ..],  in  Equation  97,  which  is  a  candidate  for  replace¬ 
ment  in  a  supercomputer  implementation.  The  embedded  coordinate  transforma¬ 
tion  Isas  conveniently  established  "generalized  implicit  lines  (GILS)",  cf., 

Lin  (1985),  upon  which  an  efficient  factorization  of  the  Jacobian  can  be  con¬ 
structed.  Therefore,  the  appropriate  procedure  appears  to  be  a  tensor  matrix 
product  approximation  to  the  Jacobian,  whereupon  the  left  side  of  Equation  97 
is  replaced  by  a  factored  form 

r  T 


1  M  +  A  3{R} 

a¥m  +  6  3TqT 


(98) 


The  factors  J  and  J^  ,  spanning  the  transverse  (horizontal)  plane  will  be 


classical  constructions,  i.e., 


M  +  9 
n 


3{R} 


3TqT  n 


m  +  rMr}  r 

ME  +  63TqT  C 


(99) 


each  of  which  is  block  (2k+l)  diagonal,  where  k  is  the  completeness  degree 


of  the  finite  element  trial  space  basis  (N^)  .  The  vertical  direction  factor 


J ^  can  be  arranged  to  account  for  the  continuity  equation  integration,  but 


otherwise  will  be  the  classical  formulation,  i.e., 


1  3{R}c 

IT  \  +  6  aTq'R 


(100) 


85.  This  step  completes  the  definitions  of  the  Jacobian  factorization. 
Equations  98-100,  as  a  function  of  A  and  a,  6,  y>  V  and  0  of  the  TWS  algo¬ 
rithm  statement . 
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Design  for  the  Supercomputer  Environment 


86.  The  first  five  steps  have  lead  to  this  critical  point,  which  anti¬ 
cipates  that  a  major  redesign  of  the  RMA-10  linear  algebra  solution  procedure 
is  critical  to  an  efficient  supercomputer  embodiment  (code)  of  the  algorithm. 
It  is  important  to  emphasize  that  this  redesign  focuses  on  the  very  internal 
workings  only  of  the  code,  ie.,  the  computationally  dense  DO  loops  over  ele¬ 
ments  that  form  Equation  96  and  manage  the  (proposed)  solution  steps.  Equa¬ 
tions  97-100.  The  entire  "external"  hierarchy  regarding  input/output,  the 
element  library,  grid  generation,  graphics,  etc.,  can  be  retained  unchanged. 

It  is  well  understood  that  these  portions  constitute  the  major  fraction  of  any 
working  code  and  that  they  remain  essentially  intact  (on  the  host  computer) 
while  the  intense  number  crunching  operations  are  moved  over  to  the  supercom¬ 
puter  environment. 

87.  A  second  important  point  regarding  step  5  is  that  the  suggested  use 
of  tensor  matrix  factorization  of  the  linear  algebra  Jacobian  in  no  way  limits 
the  geometric  generality  of  the  finite  element  algorithm.  Because  of  the 
hydrostatic  assumption,  leading  to  solution  of  the  continuity  equation  in  the 
z-direction  as  an  initial-value  problem,  there  is  a  corresponding  organization 
to  the  3D  mesh,  recall  Figure  3.  This  is  recognized  in  Equation  100,  and 
there  is  no  requirement  that  these  columns  contain  the  same  number  of  nodes, 
ns  is  the  case  with  an  approximately-factored  finite  difference  algorithm. 

The  concept  of  tensor  matrix  products  and  sweeping  on  (curvilinear)  paths  is 
directly  applicable  to  the  3D  mesh  equivalent  of  that  illustrated  in  Fig¬ 
ure  3b,  with  the  embedded  tri-angular  (ie.,  tetrahedron/pentahedron)  elements 
providing  required  geometry  transitions  along  the  bed  profile  as  presently 
done  in  RMA-10.  It  is  also  important  to  note  that  the  "vertical"  coordinate 
lines  need  not  be  exactly  parallel  and/or  vertical,  since  the  continuity  equa¬ 
tion  algorithm  redesign  will  become  re-expressed  In  terms  of  the  contra- 
variant  velocity  components  which  are  always  parallel  to  (transformed)  coordi¬ 
nate  curves.  The  tensor  transformation  law  is  then  used  to  provide  the 
required  physical  velocity  components. 


88.  Step  6  identifies  the  key  matching  facets  of  the  RMA-10  algorithm 
restatement  with  the  supercomputer  vector /multi-processing/parallel  processing 
environment.  For  this,  construction  of  Equation  96  and  97  readily  generalizes 
on  the  concept  of  a  general  implicit  line. 
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The  two  basic  loops  for  solution  of  Equations  96  and  97  therefore,  are: 

3 

1.  Over  dependent  variables  on  N  grid  cells,  form  {F}={F(3D  geom, 

Q,  ,  ,  ...)  and} 

3 

2.  For  an  N  grid  and  all  variables  on  three  general  planes,  form 
J^=J^  (ID  geom,  U  ,  ...) 

3.  Solve  A,  B,  {Q}  =  f  (J^  ,  data). 

Thus,  the  linear  algebra  solution  sequence  defined  by  Equation  98  effectively 
3  2 

sub-divides  an  N  problems  into  3N  problems  of  N  length  without  loss  of 
accuracy  or  connectivity ,  making  the  algorithm  construction  highly  suited  for 
concurrent  (MIMD)  parallel  processing  implementation. 

89.  In  addition  to  parallelism  derived  from  the  factoring  of  Equa¬ 
tion  97,  parallelism  also  occurs  throughout  the  algorithm  at  various  levels 
including: 

1.  Independence  of  {F}  and  in  the  same  time  period,  as 

illustrated  in  Figure  7. 

2.  Parallel  formation  of  and  on  different  planes,  as 

illustrated  in  Figure  8. 

3.  Simultaneous  formation  of  equation  terms  for  the  {F}  and  J. 
arrays.  The  terms  are  position  independent  in  each  differential 
equation. 

4.  Parallel  and  systolic  methods  for  solving  the  formed  system  of 
equations . 

90.  Figure  7  is  a  schematic  diagram  of  logic  and  data  flow  for  the 
defined  general  implicit  line  algorithm.  The  processing  direction  is  gen¬ 
erally  clock-wise  beginning  at  the  upper  left.  The  vertical  divisions  (sweep 
1,2,3)  indicate  operations  performed  in  each  sweep  direction.  The  rectangular 
boxes  are  the  operations  performed  and  the  bracketed  symbols  represent  inter¬ 
mediate  data  temporarily  retained  for  use  in  the  next  set  of  operations.  In 
each  sweep  direction  a  loop  over  a  row  of  the  discretized  geometry  builds  an 
equation  system  which  is  solved  using  Gauss  elimination.  At  the  end  of  sweep 
direction  3,  Equation  98  is  complete  and  a  time  step  is  taken  proceeding  from 
two  to  one.  Closure  data  is  evaluated  at  the  new  time  step  and  the  cycle  is 
re-initiated . 

91.  Upon  closer  examination  of  the  serial  processes  in  Figure  7,  and 
with  an  understanding  of  pipelining,  it  becames  obvious  that  any  serial  set  of 
operations  can  be  made  parallel  through  the  use  of  pipelining  (also  defined  as 
systolic,  Rung,  1980).  The  simple  requirement  is  that  the  processors  involved 


are  synchronized  such  that  the  output  from  pairs  of  processors  forms  the  input 
to  the  next  processor.  The  output  can  be  bit,  byte,  word,  or  array  sized, 
utilizing  appropriate  intermediate  storage  for  proper  synchronization.  Hence, 
all  processors  are  continuously  busy  working  on  the  same  job  in  different  time 
slots  of  job  completion. 

92.  Pipelining  is  illustrated  in  Figure  7.  If  { F }  and  J  are  swept 
in  planes  normal  to  the  x^  direction,  the  completion  of  the  first  direction 
sweep  in  each  plane  produces  the  data  (A)  required  for  the  second  direction 
sweep  in  that  plane.  Hence,  the  first  and  second  direction  sweeps,  which  are 
defined  serially,  can  be  made  parallel  through  a  processing  time  delay  and  use 
of  an  intermediate  buffer  array  to  hold  the  intermediate  data. 

93.  Each  differential  equation  term  in  (f)  and  its  derivative  J 
consists  of  products  and  sums  which  are  position  independent.  This  means  that 
the  terms  can  be  evaluated  in  parallel  and  finally  combined  for  each  gridpoint 
on  a  sweepline.  Parallelism  at  this  level  has  the  capability  to  consume  large 
quantities  of  problems  data.  The  utility  of  this  parallelism,  therefore,  is 
strongly  dependent  upon  memory /processor  speed  ratio,  data  communication  speed 
and  the  number  of  paths  between  memory  and  processors. 

94.  Finally,  parallel  processing  of  the  equation  solving  process  is  a 
consideration.  Due  to  its  broad  application  arena,  this  area  has  received 
considerable  attention  by  various  researchers  e.g.,  Ortega  and  Voight  (1985). 
Also,  as  illustrated  in  Figure  9,  the  equation  solving  process  represents  only 
about  15%  of  the  work  required  for  a  tensor  product  factored  CFD  solution. 

The  formation  of  the  coefficients  and  right  side  (F)  account  for  85%;  there¬ 
fore,  this  step  concentrates  on  parallelism  in  the  formation  aspects  of  the 
(f)  and  components,  and  re-assembly  of  the  data  for  an  efficient,  paral¬ 

lel,  iterative  solution. 

Data  organization 

95.  Local  spatial  coupling  of  data  is  an  extremely  important  character¬ 
istic  of  numerical  PDE  algorithms.  This  coupling  requirement  complicates  the 


separation  of  problem  data  for  parallel  processing.  This  suggests  the  forma¬ 
tion  of  data  in  hierarchical  packets  with  certain  data  repeated  among  the 
packets  to  meet  the  local  coupling  requirements.  Packets  at  the  top  of  the 
hierarchy  would  contain  a  few  large  groups  of  data  with  less  repetition,  while 
packets  at  the  bottom  of  the  hierarchy  would  be  many  and  small  and  be  more 
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repetitious.  Hence,  the  data  would  be  distributed  among  parallel  machines  for 
concurrent  processing. 

96.  Full  categorization  of  parallel  and  concurrent  processing  will  re¬ 
quire  definition  of  the  concurrency  constraints  and  penalty  imposed  by  data 
unavailability  of  each  level  of  the  hierarchy.  Categorical  subdivision  of  the 
data  into  specified  problem  data  and  partially  processed  data  (intermediate 
data)  provides  some  insight.  The  problem  data  is  relatively  stagnant,  chang¬ 
ing  only  occasionally  toward  the  top  of  the  hierarchy.  Intermediate  data,  on 
the  other  hand,  must  be  accumulated  for  portions  of  the  algorithm  at  lower 
hierarchal  levels.  At  each  level,  progressively  smaller  memories  or  registers 
(buffers)  serve  to  hold  data  until  it  can  be  processed.  Ideally,  each  buffer 
has  a  path  to  all  processors  requiring  the  memory  data,  and  the  data  is  "used 
up"  requiring  no  further  transfer. 

97.  Since  data  storage  and  communication  are  increasingly  important 
factors  as  the  number  of  processors  increases,  a  system  for  decreasing  data 
motion  is  beneficial  for  parallel  processing.  In  a  synchronous  system,  data 
must  be  available  to  the  processors  each  cycle  in  order  to  maintain  effective 
processor  use.  Data  manipulation  and  overhead  are  the  principal  culprits  in 
current  supercomputer  inability  to  maintain  optimal  speed.  The  need  to  con¬ 
tinually  rearrange  data  into  long  vectors  for  efficient  pipeline  operations 
wastes  cycle  time  and  adds  a  tremendous  overhead  burden  to  an  algorithm. 

98.  One  method  for  significantly  reducing  this  problem  for  parallel 
processing  is  through  use  of  indirect  addressing  pointers.  The  problem  data 
is  stored  in  a  global  set  and  pointers  are  generated  for  random  data  selection 
from  the  global  set.  Using  this  method,  data  manipulation  is  minimized  with 
low  overhead  penalty  and  lower  level  packet  data  repetition  can  be  minimized. 

99.  Figure  10  illustrates  an  indirect  addressing  scheme  for  the  { F } 
array  problem  data  which  uncouples  the  global  geometry  and  the  physics.  The 
problem  description  data  is  represented  in  the  lower  right  of  Figure  10(a)  as 
two  dimensional  arrays.  Four  types  of  data  are  required  for  complete  physical 
de  scriptions.  These  are:  initial  values  of  the  dependent  variables  (Q) , 
spatially  distributed  parameter  coefficients  (P) ,  local  field  constants  (C) , 
and  differential  operator  arrays  (M) .  Each  of  the  data  categories  are  stored 
in  two  dimensional  arrays  for  ROW/COLUMN  addressing.  The  row  pointers  are 
derived  from  geometric  indices  (10b,  10c)  which  are  a  function  of  geometric 


location  (grid  points).  The  column  pointers  (lOd)  select  the  differential 
equation  variables. 

Hardware  variables  and  constraints 

100.  Computer  hardware  architectures  for  parallel  processing  are  as 
diverse  as  algorithm  variations  in  CFD.  Each  architectural  design  has  its 
strong  and  weak  points,  each  of  which  may  be  dependent  upon  the  specific  ap¬ 
plication  involved.  It  appears,  however,  that  there  are  certain  design  crite¬ 
ria  which  can  be  utilized  to  determine  the  ultimate  effective  speed  of  a 
machine . 

101.  As  computational  chip  speeds  increase,  it  is  becoming  increasingly 
apparent  that  the  effectiveness  of  a  parallel  architecture  is  ultimately  mea¬ 
sured  in  the  ability  to  communicate  the  data  from  memory  to  processor  and  be¬ 
tween  processors.  In  a  parallel  system,  the  limit  of  the  number  of  processors 
effectively  used  is  determined  by  the  ability  to  get  the  data  to  the  processor 
at  every  time  cycle.  The  inability  to  do  this  wastes  processor  cycles  and, 
therefore,  compromises  solution  speed. 

102.  Assuming  two  inputs  and  one  output  are  required  for  each  processor 
operating  from  a  single  memory  bank,  for  example,  the  memory  access  speed  to 
keep  six  processors  busy  must  be  at  least  18  times  the  speed  of  each  proces¬ 
sor.  If  the  memory  access  speed  is  only  12  times  the  speed  of  each  processor, 
for  example,  a  bottleneck  will  result  and  processor  cycles  will  be  wasted 
waiting  for  the  next  data  to  arrive. 

103.  Some  of  these  machine  limitations,  however,  can  be  circumvented  at 
the  algorithm  level.  In  the  above  example,  the  problem  might  be  altered  to 
allow  for  four  of  the  processors  to  operate  in  parallel  on  data  from  memory 
and  the  other  two  processors  to  operate  on  data  from  the  output  of  the  four 
processors  in  pipeline  fashion.  This  would  reduce  the  memory  access  require¬ 
ments  from  18  to  10  accesses  per  processor  cycle,  thus  keeping  all  processors 
continuously  busy.  The  limiting  factor  for  pipelines,  however,  is  pipeline 
initiatlon/end  cost  which  in  this  case  is  two  process  cycles. 

104.  By  knowing  the  hardware  memory  and  processor  speeds,  together  with 
communication  links,  therefore,  algorithm  components  may  be  made  to  keep  the 
hardware  busy.  Since  these  relative  numbers  of  hardware  components  are  dif¬ 
ferent  for  different  machines,  the  resulting  detailed  algorithm  structure  will 
appear  differently  on  each  machine. 

105.  Each  of  these  precepts  was  independently  analyzed  to  construct  the 


COMSYS  (TM)  aerodynamics  algorithm,  resulting  in  a  prototype  vector/parallel 
processing  structure.  This  was  then  coded  (in  Fortran)  for  testing  and  veri¬ 
fication  runs  on  a  serial  computer,  ie.,  a  Cyber-1 70/835 .  This  machine  then 
serves  as  the  host  for  an  echos  simulation  of  the  assembly  language  code,  as 
well  as  the  actual  execution  on  a  parallel  processor  which  in  practice  to  date 
has  been  a  Control  Data  Cyberplus  attached  parallel  processor. 

106.  A  (each)  Cyberplus  machine  consists  of  a  base  (integer)  processor 
containing  11  units,  a  floating  point  processor  having  3  units,  and  a  512-K 
word  high  performance  memory  (HPM) .  All  processor  and  memory  interfaces  oper¬ 
ate  simultaneously  on  a  20-ns  clock,  and  communicate  through  a  fully  connected 
crossbar  on  every  cycle.  The  HPM  is  used  to  maintain  a  plane  of  dense  mesh 
raw  data,  together  with  intermediate  data  awaiting  sequential  use.  The  inte¬ 
ger  finite  element  mesh  connection  data  string  is  a  critically  important  speed 
feature  for  parallel  processing  that  heretofore  has  saddled  serial  machine 
finite  element  code  implementations  with  speeds  a  factor  of  2-4  slower  than 
finite  difference  codes.  These  data  are  transferred  from  the  HPM  to  the  base 
processor  memories  and  indices  pointing  to  problem  data  stored  in  the  HPM  are 
calculated.  In  the  floating  point  processor,  multiply  and  add  units  operate 
on  pre-selected  data  to  form  the  problem  solution  arrays. 

107.  Example  executions  of  a  model  3D  problem  discretization  using  bi- 

6  2  2  2 
linear  hexahedron  elements  on  meshes  up  to  10  nodes  (ie.,  10  x  10  x  10  ) 

have  a  Cyberplus  speedup  factor  of  70  over  host  speed.  The  Cyber  170-835 
host  speed  is  about  one  Mflop,  yielding  a  predicted  operating  speed  on  Cyber¬ 
plus  of  62  Mflop.  The  base  and  floating  point  processors  in  Cyberplus  are 
high  performance  (50  Mflop  each),  hence  the  present  verification  has  achieved 
about  607  of  the  available  speed. 

108.  A  host  of  parallel  architectures  (Figure  11)  have  recently  come 
into  existence  and  are  available  for  algorithm  testing.  Many  of  these  use 
standard  market  chips  such  as  the  Motorola  Series  6800.  While  software  is 
limited  for  these  machines,  many  have  some  Fortran  capability  under  Unix. 

This  utility,  together  with  strong  assembly  language  capability,  provides  the 
basis  for  coding  and  testing  various  aspects  of  algorithm  parallelism  as  was 
done  on  the  Cyberplus. 

Example,  FLEX/32  Architecture  Variations 

109.  With  the  background  established,  algorithm/architecture  variations 
for  real  machine  configurations  require  testing.  The  enhanced  FLEX/32 
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Figure  11.  Parallel  Architectures  Sample 
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parallel  processor  provides  an  ideal  setting  to  study  these  phenomena.  Each 
cabinet  can  contain  up  to  20  processors  and  30  high  speed  co-processors,  large- 
main  memory  of  up  to  18  megabytes  and  local  memories  for  each  processor  of  up 
to  one  megabyte  each.  The  processors  and  memories  can  be  interconnected  vari¬ 
ously  to  provide  a  system  adaptable  to  the  optimal  configuration  defined  by 
the  parallelism  inherent  in  each  algorithm  component. 

110.  On  the  FLEX/32  system  both  parallel  (M1MD)  and  pipelined 
(SYSTOLIC)  algorithm  configurations  can  be  tested  within  the  same  framework. 
The  comparative  machine  architectures  are  illustrated  in  Figure  12  where  CCC 
and  CAC  represent  common  memory  access  and  each  C2C  represents  68020-based 
chips  in  parallel  with  micro-code-capable  computer  with  high  speed  floating 
point  processor.  In  Figure  12a,  the  global  memory  across  the  top  is  conveyed 
to  all  local  processor/memory  systems  (C2C)  equally  for  parallel  processing. 

In  Figure  12b,  the  global  memory  feeds  a  single  computer  and  results  are  cas¬ 
caded  to  other  computers,  thus  reducing  common  memory  access  speed  require¬ 
ments  for  the  same  number  of  processors.  Since  each  of  these  configurations 
is  anticipated  as  optimal  for  portions  of  a  candidate  PDE  algorithm,  the 
FLF.X/32  architecture  provides  an  ideal  test  bed. 

111.  Coding  of  the  learned  CFS/parallel  concepts  on  a  FLEX/32  provides 
the  real  machine  environment  experience  which  exposes  the  solution  capability 
and  limitations. 


a)  Parallel  Configuration 


b)  Pipelined  Configuration 


Figure  12.  FLEX/32  Multicomputer  Configurations 
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PART  V:  SUMMARY  AND  CONCLUSIONS 


Summary 


112.  Despite  its  ability  to  resolve  complex  geometry  in  computat ional 
domains  such  as  estuaries,  rivers  and  coastal  areas,  the  finite  element 
method,  as  commonly  applied  on  unstructured  grids,  suffers  from  the  fact  that 
it  requires  excessive  computational  effort  per  time-step.  However,  this  can 
be  attributed  more  to  the  techniques  commonly  employed  by  finite  element 
modelers  in  solving  the  matrix  equation  system  than  the  method  itself. 

113.  This  study  was  conducted  with  three  objectives  in  mind.  The  first 
was  to  provide  the  average  reader  with  an  increased  understanding  of  the  two 
computational  methods  most  often  applied  in  computational  fluid  mechanics; 
namely,  finite  differences  and  finite  elements.  Hopefully  the  discussion  pro¬ 
vided  has  amply  demonstrated  that  the  two  methods  share  common  ground.  As 
noted,  all  computational  methods  can  be  cast  into  the  form  of  a  weighted  re¬ 
sidual  statement.  Different  methods  result  depending  upon  the  selection  of 
the  weighting  functions  and/or  stability  constraints.  The  second  objective  of 
the  study  was  to  offer  suggestions  or  ideas  that  might  be  implemented  in  a  re¬ 
latively  short  timeframe  to  reduce  the  computational  costs  of  typical  finite 
element  hydrodynamic  models.  Ideas  such  as  implementing  an  iterative  solver 
for  the  linear  algebra  problem,  employing  an  explicit  time  integration  scheme 
with  a  condensed  matrix,  utilizing  only  triangular  type  elements  to  eliminate 
the  computation  time  associated  with  numerical  quadrature,  and  uncoupling  the 
dependent  variables  should  all  be  considered.  The  final  objective  was  to 
present  a  series  of  steps  to  be  accomplished  over  a  longer  time  frame  that 
will  ultimately  result  in  not  only  a  computationally  efficient  model  but  also 
a  robust  one  providing  accurate  solutions  while  possessing  desirable  stability 
characteristics . 

114.  The  construction  depends  upon  the  use  of  a  grid  with  some  degree 
of  structure,  although,  a  completely  structured  grid  is  not  required.  With  a 
semi-structured  grid,  the  coefficient  matrix;  i.e.,  the  Jacobian,  associated 
with  the  linear  algebra  problem  can  be  factored  such  that  computational  sweeps 
along  curvilinear  coordinate  lines  can  be  made.  With  a  linear  basis  function, 
the  linear  algebra  problem  reduces  to  block-tridiagonal  systems  in  each  direc¬ 
tion  which  can  easily  be  solved.  Of  course,  higher  order  basis  functions  will 


result  in  pentadiagonal,  etc.,  systems.  Such  an  algorithm  programmed  lor  so¬ 
lution  in  a  parallel  processing  environment  will  result  in  an  extreme! v  effi¬ 
cient  computational  model  for  computing  free  surface  hydrodynamics  over  long 
periods  of  times.  In  addition,  with  the  finite  element  algorithm  based  upon 
the  Taylor  weak  statement,  the  numerical  model  will  have  stability  mechanisms 
embedded  in  the  algorithm  that  will  provide  for  selective  dampening,  resulting 
in  accurate  solutions  in  high  gradient  regions. 

Conclusions 

115.  The  first  conclusion  from  this  effort  is  that  completely  implicit 
three  dimensional  finite  element  models  that  are  programed  for  solution  on 
completely  unstructured  grids  cannot  be  considered  for  long  term  flow  computa¬ 
tions,  e.g.  weeks  to  months.  In  fact  it  is  questionable  if  such  models  can 
even  be  considered  for  studies  that  only  involve  a  few  tidal  cycles. 

116.  If  unstructured  grids  are  considered  essential,  probably  the  best 
short  term  solution  is  to  employ  a  split  mode  approach.  The  external  model 
would  consist  of  implicitly  solving  a  frictionally  damped  wave  equation  for 
the  water  surface  elevations,  with  the  vertically  averaged  velocities  and 
salinity  and/or  temperature  computed  explicitly  using  the  condensed  matrix 
concept.  The  internal  model,  which  computes  the  deviations  from  the  verti¬ 
cally  averaged  components,  would  then  also  employ  an  explicit  time  integration 
scheme  with  condensed  matrices. 

117.  Perhaps  the  simplest  approach  for  implementation  in  the  short  term 
for  an  unstructured  grid  model  to  be  applied  over  at  most  a  few  tidal  cycles 
would  be  to  leave  the  existing  finite  element  algorithm  intact  but  employ  an 
explicit  time  integration  scheme  with  a  condensed  matrix.  Even  though  the 
computational  time  step  will  be  limited  by  the  speed  of  a  free  surface  gravity 
wave,  the  linear  algebra  problem  at  each  time  step  becomes  a  trivial  opera¬ 
tion.  Therefore,  the  computational  costs  should  be  less  than  that  required  In 
the  implicit  model  as  long  as  the  element  size  does  not  become  so  ridiculously 
small  that  extremely  small  time  steps  are  required. 

118.  To  achieve  not  only  a  computationally  efficient  but  also  a  stable 
3D  time-accurate  finite  element  hydrodynamic  model,  the  tensor  product  algo¬ 
rithm  developed  from  a  Taylor  weak  statement  should  be  implemented.  Such 
factorizations  require  some  degree  of  regularity  for  the  grid;  however, 


through  the  use  of  block  structured  grids  a  significant  degree  of  generality 
can  still  be  achieved.  In  the  long  term,  it  is  believed  that  such  a  model, 
programed  for  parallel  processing,  offers  the  best  hope  for  3D  time-varying 
1 inite  element  hvdrodynamic  computations. 
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